BigQuery ETL Source
BigQuery data source allows you to read data from the BigQuery data warehouse.
Schema Type
See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.
After providing schema type details, the next step is to configure the data source.
Data Source Configuration
Connection Name: Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for GCS Batch earlier. Or create one as explained in the topic - BigQuery Connection →
Load From Big Query Table/ Load From Query Results: Choose one of the options.
If the option Load From Big Query Table is selected, then proceed by updating the following fields.
Dataset Name: The dataset name should be provided.
Table Name: The table name should be provided.
Project ID of Dataset: The Google Cloud project ID should be provided. If not specified, the project from the service account key of connection will be used.
Columns to Fetch: The value for comma separated list of columns to select should be provided.
Where Condition: The where condition should be provided.
Partition Filter Condition: The partition filter condition should be provided.
If the option Load From Query Results is selected, then you need to provide a query to be executed in BigQuery along with the ‘Location of Datasets’ used in the query.
Additionally, while using incremental read feature with read control set to limit by count, you need to add the Order By clause in the query to fetch data in sequential order from source table.
Example:
select * from projectId.datasetName.tableName order by <column for offset>
Maximum Parallelism Mention the maximum number of partitions to split the data into.
Preferred Minimum Parallelism The preferred minimal number of partitions to split the data into which should be less than or equal to maximum parallelism.
Add Configuration: The user can add further configurations.
Schema Results: Under schema results, select the Big Query Dataset name and Big Query Table Name.
Details: Under details you will be able to view the:
Table of Expiration
Number of Rows
Last Modified
Data Location
Table ID
Table Size
Created
Table Schema: Table schema details can be viewed here.
Detect Schema
Check the populated schema details. For more details, see Schema Preview →
Incremental Read
Optionally, you can enable incremental read. For more details, see BigQuery Incremental Configuration →
Pre Action
To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →)
Notes
Optionally, enter notes in the Notes → tab and save the configuration.
If you have any feedback on Gathr documentation, please email us!