BigQuery Data Asset Source

Create a Data Asset Through BigQuery

To create a data asset through BigQuery Source, configure parameters as follows:

Connection Name

Connections are the service identifiers.

A connection name can be selected from the list if you have created and saved connection details for GCS Batch earlier.

Or create one as explained in the topic - BigQuery Connection →

Load From Big Query Table/ Load From Query Results

Choose one of the options.

If the option Load From Big Query Table is selected, then proceed by updating the following fields.

Dataset Name

The dataset name should be provided.

Table Name

The table name should be provided.

Max No of Rows

Specify the maximum number of sample records you wish to keep in the data asset.

This feature helps in obtaining a manageable subset of data for testing and design purposes, facilitating efficient application development while optimizing resource usage.

Sampling Method

This option offers flexibility in how you retrieve sample data.

Following are the ways:

Top N: Retrieve the specified number of initial records from the data source based on the specified maximum number of rows. This is particularly useful when you want to analyze or design with a specific set of initial records.
Random Sample: Fetch a random subset of records from your sample data, ensuring a diverse representation. This approach is valuable when you require a more comprehensive assessment of your data’s characteristics.

More Configurations

Expand the More Configurations option to see the additional configuration parameters.

Project ID of Dataset

The Google Cloud project ID should be provided. If not specified, the project from the service account key of connection will be used.

Columns to Fetch

The value for comma separated list of columns to select should be provided.

Where Condition

The where condition should be provided.

Partition Filter Condition

The partition filter condition should be provided.

If the option Load From Query Results is selected, then you need to provide a query to be executed in BigQuery along with the ‘Location of Datasets’ used in the query.

Maximum Parallelism

Mention the maximum number of partitions to split the data into.

If you have any feedback on Gathr documentation, please email us!

BigQuery Data Asset Source

Connection Name #

Load From Big Query Table/ Load From Query Results #

Dataset Name #

Table Name #

Max No of Rows #

Sampling Method #

More Configurations #

Project ID of Dataset #

Columns to Fetch #

Where Condition #

Partition Filter Condition #

Maximum Parallelism #