GCS Emitter

Gathr provides GCS emitter. The configuration details for GCS emitter are mentioned below:

Save as DatasetSelect the checkbox to save the schema as dataset. Mention the dataset name.
Connection NameChoose the connection name from the drop down to establish the connection.
Override CredentialsCheck the checkbox for user specific actions.
Service Account Key FileUpload GCP Service Account Key File to create connection. You can test the connection by clicking at the TEST CONNECTION button.
Bucket NameMention the bucket name.
PathMention the sub-directories of the bucket name mentioned above to which the data is to be written.
Output TypeSelect the output format in which the results will be processed.
DelimiterSelect the message field separator.
Output FieldsSelect the fields in the message that needs to be a part of the output data.
Partitioning RequiredTo partition the data, checkmark the box.
Partition ColumnsOption to select fields on which the data will be partitioned.
Save Mode

Save Mode is used to specify the expected behavior of saving data to a data sink.

ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown.

Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.

Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.

Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.

This is similar to a CREATE TABLE IF NOT EXISTS in SQL

Check point Storage LocationSelect the check pointing storage location. The available options are S3, HDFS, EFS.
Check point ConnectionsSelect the connection from the drop-down list. Connections are listed corresponding to the selected storage location.
Override CredentialsCheck the checkbox for user specific actions.

The name of user through which the Hadoop service is running.

Checkpoint Directory

It is the path where Spark Application stores the checkpointing data.

For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself.

For S3, enter an absolute path like: S3://BucketName/checkpointingDir

Time-Based Check PointSelect checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Enable TriggerTrigger defines how frequently a streaming query should be executed.
Trigger Type

Available options in drop-down are:

One Time Micro Batch

Fixed Interval Micro Batch


User can add further configurations (Optional).

Example: Perform imputation by clicking the ADD CONFIGURATION button.

Example: nullValue =123, the output will replace all null values with 123

ENVIRONMENT PARAMSClick the + ADD PARAM button to add further parameters as key-value pair.