GCS (Batch and Streaming) Data Source
Gathr provides batch and streaming GCS (Google Cloud Storage) channels.
On the GCS channel, you will be able to read data with formats including JSON, CSV, TEXT, XML, Fixed Length, Binary, Parquet, ORC.
The configuration for GCS data source is specified below:
Field | Description |
---|---|
Connection Name | Select GCP connection name for establishing connection. |
Override Credentials | Check the option for user specific actions. |
Service Account Key File | GCP service account key file to create connection. |
Bucket Name | Provide path of the file for Google storage bucket name. |
Path | Provide value for the end path with * in case of directory. For e.g. outdir.* |
File Filter | Provide a file pattern. File filter is used to only include files with file names matching the pattern. For e.g *.pdf or *emp *.csv |
Recursive File Lookup | Check the option to retrieve the files from current/sub-folder(s). |
File Filter and recusursive file lookup will be available when the binary format is selected.
The user can add configuration by clicking at the ADD CONFIGURATION button.
Next, in the Detect Schema window, the user can set the schema as dataset by clicking on the Save As Dataset checkbox.
The Incremental Read option will be in GCS batch data source and not in the GCS Streaming channel.
Configure Pre-Action in Source →
If you have any feedback on Gathr documentation, please email us!