GCS Emitter

Gathr provides GCS emitter. The configuration details for GCS emitter are mentioned below:

Field	Description
Save as Dataset	Select the checkbox to save the schema as dataset. Mention the dataset name.
Connection Name	Choose the connection name from the drop down to establish the connection.
Override Credentials	Check the checkbox for user specific actions.
Service Account Key File	Upload GCP Service Account Key File to create connection. You can test the connection by clicking at the TEST CONNECTION button.
Bucket Name	Mention the bucket name.
Path	Mention the sub-directories of the bucket name mentioned above to which the data is to be written.
Output Type	Select the output format in which the results will be processed.
Delimiter	Select the message field separator.
Output Fields	Select the fields in the message that needs to be a part of the output data.
Partitioning Required	To partition the data, checkmark the box.
Partition Columns	Option to select fields on which the data will be partitioned.
Save Mode	Save Mode is used to specify the expected behavior of saving data to a data sink. ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data. Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data. This is similar to a CREATE TABLE IF NOT EXISTS in SQL
Check point Storage Location	Select the check pointing storage location. The available options are S3, HDFS, EFS.
Check point Connections	Select the connection from the drop-down list. Connections are listed corresponding to the selected storage location.
Override Credentials	Check the checkbox for user specific actions.
Username	The name of user through which the Hadoop service is running. 👉 Click TEST CONNECTION BUTTON to test the connection.
Checkpoint Directory	It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir
Time-Based Check Point	Select checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Enable Trigger	Trigger defines how frequently a streaming query should be executed.
Trigger Type	Available options in drop-down are: One Time Micro Batch Fixed Interval Micro Batch
ADD CONFIGURATION	User can add further configurations (Optional). 👉 Add various Spark configurations as per requirement. Example: Perform imputation by clicking the ADD CONFIGURATION button. 👉 For imputation replace nullValue/emptyValue with the entered value across the data. (Optional) Example: nullValue =123, the output will replace all null values with 123
ENVIRONMENT PARAMS	Click the + ADD PARAM button to add further parameters as key-value pair.

👉

The user can further configure by clicking at the ADD CONFIGURATION button.

If you have any feedback on Gathr documentation, please email us!