BigQuery Ingestion Source

BigQuery data source allows you to read data from the BigQuery data warehouse.

Data Source Configuration

Fetch From Source/Upload Data File

For designing the application, you can either fetch the sample data from the BigQuery source by providing the data source connection details or upload a sample data file in one of the supported formats to see the schema details during the application design.

If Upload Data File is selected to fetch sample data, provide the below details.

File Format: Select the sample file format (file type) depending on the data type.

Gathr-supported file formats for BigQuery data source are CSV, JSON, TEXT, Parquet and ORC.

For CSV file format, select its corresponding delimiter.

Header Included: Enable this option to read the first row as a header if your BigQuery data is in CSV format.

Upload: Please upload the sample file as per the file format selected above.

If Fetch From Source is selected, continue configuring the data source.

Connection Name: Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for BigQuery earlier. Or create one as explained in the topic - BigQuery Connection →

Use the Test Connection option to ensure that the connection with the BigQuery channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.

Load From Big Query Table/ Load From Query Results

Choose one of the above options.

If the option Load From Big Query Table is selected, then proceed by updating the following fields.

Dataset Name: The dataset name should be provided.

Table Name: The table name should be provided.

Add Configuration: Additional properties can be added using this option as key-value pairs.

More Configurations

Project ID of Dataset: The Google Cloud project ID should be provided. If not specified, the project from the service account key of connection will be used.

Columns to Fetch: The value for comma separated list of columns to select should be provided.

Where Condition: The where condition should be provided.

Partition Filter Condition: The partition filter condition should be provided.

Maximum Parallelism: Mention the maximum number of partitions to split the data into.

Preferred Minimum Parallelism The preferred minimal number of partitions to split the data into which should be less than or equal to maximum parallelism.

Schema Results

Under schema results, select the Big Query Dataset name and Big Query Table Name.

Details: Under details you will be able to view the:

  • Table of Expiration

  • Number of Rows

  • Last Modified

  • Data Location

  • Table ID

  • Table Size

  • Created

Table Schema: Table schema details can be viewed.

If the option Load From Query Results is selected, then you need to provide a query to be executed in BigQuery along with the ‘Location of Datasets’ used in the query.

Additionally, while using incremental read feature with read control set to limit by count, you need to add the Order By clause in the query to fetch data in sequential order from source table.

Example:

select * from projectId.datasetName.tableName order by <column for offset>

Schema

Check the populated schema details. For more details, see Schema Preview →

Advanced Configuration

Optionally, you can enable incremental read. For more details, see BigQuery Incremental Configuration →

Top