BigQuery ETL Source

BigQuery data source allows you to read data from the BigQuery data warehouse.

Schema Type

See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.

After providing schema type details, the next step is to configure the data source.

Data Source Configuration

Connection Name: Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for GCS Batch earlier. Or create one as explained in the topic - BigQuery Connection →

Load From Big Query Table/ Load From Query Results: Choose one of the options.

If the option Load From Big Query Table is selected, then proceed by updating the following fields.

Dataset Name: The dataset name should be provided.

Table Name: The table name should be provided.

Project ID of Dataset: The Google Cloud project ID should be provided. If not specified, the project from the service account key of connection will be used.

Columns to Fetch: The value for comma separated list of columns to select should be provided.

Where Condition: The where condition should be provided.

Partition Filter Condition: The partition filter condition should be provided.

If the option Load From Query Results is selected, then you need to provide a query to be executed in BigQuery along with the ‘Location of Datasets’ used in the query.

Additionally, while using incremental read feature with read control set to limit by count, you need to add the Order By clause in the query to fetch data in sequential order from source table.

Example:

select * from projectId.datasetName.tableName order by <column for offset>

Maximum Parallelism Mention the maximum number of partitions to split the data into.

Preferred Minimum Parallelism The preferred minimal number of partitions to split the data into which should be less than or equal to maximum parallelism.

Add Configuration: The user can add further configurations.

Schema Results: Under schema results, select the Big Query Dataset name and Big Query Table Name.

Details: Under details you will be able to view the:

  • Table of Expiration

  • Number of Rows

  • Last Modified

  • Data Location

  • Table ID

  • Table Size

  • Created

Table Schema: Table schema details can be viewed here.

Detect Schema

Check the populated schema details. For more details, see Schema Preview →

Incremental Read

Optionally, you can enable incremental read. For more details, see BigQuery Incremental Configuration →

Pre Action

To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →)

Notes

Optionally, enter notes in the Notes → tab and save the configuration.

Top