Turnpike Processor
Turnpike is used with streaming dataset to utilize the benefits of batch transformations in streaming pipeline. User would also be able to perform sequential and priority-based execution of flows (Processors and Emitters).
The configuration details are as under:
Field | Description |
---|---|
Output Mode | Output mode to be used while writing the data to Streaming emitter. Select the output mode from the given three options: Append: Output mode in which only the new rows in the streaming data will be written to the sink Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates. Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates. |
Checkpoint Storage Location | Select the checkpointing storage location. The available options are HDFS, S3. |
Checkpoint Connections | Select the connection. Connections are listed corresponding to selected storage location. |
Override Credentials | Option to override credentials for user specific actions. |
If the Check Point Storage Location is selected as S3, then provide the AWS Key ID (S3 account key access) and Secret access key. | |
KeyTab Select Option | Select Option for Keytab. The available options are: - Specify KeyTab File Path - Upload KeyTab File |
Check Point Directory | It is the HDFS path where the Spark application stores the checkpoint data. |
Time-based Check Point | Select checkbox to enable time-based checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis. |
Enable Trigger | Trigger defines how frequently a streaming query should be executed. |
Click the +ADD CONFIGURATION button to add further configurations in key value pair.
If you have any feedback on Gathr documentation, please email us!