GCS Emitter
Gathr provides GCS emitter. The configuration details for GCS emitter are mentioned below:
Field | Description |
---|---|
Save as Dataset | Select the checkbox to save the schema as dataset. Mention the dataset name. |
Connection Name | Choose the connection name from the drop down to establish the connection. |
Override Credentials | Check the checkbox for user specific actions. |
Service Account Key File | Upload GCP Service Account Key File to create connection. You can test the connection by clicking at the TEST CONNECTION button. |
Bucket Name | Mention the bucket name. |
Path | Mention the sub-directories of the bucket name mentioned above to which the data is to be written. |
Output Type | Select the output format in which the results will be processed. |
Delimiter | Select the message field separator. |
Output Fields | Select the fields in the message that needs to be a part of the output data. |
Partitioning Required | To partition the data, checkmark the box. |
Partition Columns | Option to select fields on which the data will be partitioned. |
Save Mode | Save Mode is used to specify the expected behavior of saving data to a data sink. ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data. Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data. This is similar to a CREATE TABLE IF NOT EXISTS in SQL |
Check point Storage Location | Select the check pointing storage location. The available options are S3, HDFS, EFS. |
Check point Connections | Select the connection from the drop-down list. Connections are listed corresponding to the selected storage location. |
Override Credentials | Check the checkbox for user specific actions. |
Username | The name of user through which the Hadoop service is running. Click TEST CONNECTION BUTTON to test the connection. |
Checkpoint Directory | It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir |
Time-Based Check Point | Select checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis. |
Enable Trigger | Trigger defines how frequently a streaming query should be executed. |
Trigger Type | Available options in drop-down are: One Time Micro Batch Fixed Interval Micro Batch |
ADD CONFIGURATION | User can add further configurations (Optional). Add various Spark configurations as per requirement. Example: Perform imputation by clicking the ADD CONFIGURATION button. For imputation replace nullValue/emptyValue with the entered value across the data. (Optional) Example: nullValue =123, the output will replace all null values with 123 |
ENVIRONMENT PARAMS | Click the + ADD PARAM button to add further parameters as key-value pair. |
If you have any feedback on Gathr documentation, please email us!