Cassandra Emitter
In this article
Cassandra emitter allows you to store data in a Cassandra table.
Cassandra Emitter Configuration
To add a Cassandra emitter into your pipeline, drag the emitter to the canvas and connect it to a Data Source or processor.
The configuration settings are as follows:
Field | Description |
---|---|
Connection Name | All Cassandra connections will be listed here. Select a connection for connecting to Advance Kafka. |
KeySpace | Cassandra keyspace name. If keyspace name does not exist in Cassandra, then it will create new keyspace. |
Output Fields | Output messages fields. |
Key Columns | A single/compound primary key consists of the partition key and one or more additional columns that determines clustering. |
Table Name Expression | Cassandra table name. If the table name does not exist on the keyspace then it will create a new table. The user can create tables dynamically based on field name provided in table name expression. |
Consistency Level | Consistency level refers to how up-to-date and synchronized a row of Cassandra data is on all its replicas. Consistency levels are as follows: ONE: Only a single replica must respond. TWO: Two replicas must respond. THREE: Three replicas must respond. QUORUM: A majority (n/2 + 1) of the replicas must respond. ALL: All of the replicas must respond. LOCAL_QUORUM: A majority of the replicas in the local data center (whichever data center the coordinator is in) must respond. EACH_QUORUM: A majority of the replicas in each data center must respond. LOCAL_ONE: Only a single replica must respond. In a multi-data center cluster, this also guarantees that read requests are not sent to replicas in a remote data center. |
Replication Strategy | A replication strategy specifies the implementation class for determining the nodes where replicas are placed. Possible strategies are SimpleStrategy and NetworkTopologyStrategy. |
Replication Factor | Replication factor used to make additional copies of data. |
Enable TTL | Select the checkbox to enable TTL (Time to Live) for records to persist for that time duration. |
TTL Value | It will appear only when Enable TTL checkbox is selected. Provide TTL value in seconds. |
Checkpoint Storage Location | Select the checkpointing storage location. Available options are HDFS, S3, and EFS. |
Checkpoint Connections | Select the connection. Connections are listed corresponding to the selected storage location. |
Checkpoint Directory | It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir |
Time-Based Check Point | Select checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis. |
Batch Size | Number of records to be picked for inserting into Cassandra. |
Output Mode | Output mode to be used while writing the data to Streaming emitter. Select the output mode from the given three options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates. |
Save Mode | Save Mode is used to specify the expected behavior of saving data to a data sink. ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the data. Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the data and to not change the existing data. This is similar to a CREATE TABLE IF NOT EXISTS in SQL. |
Enable Trigger | Trigger defines how frequently a streaming query should be executed. |
Processing Time | It will appear only when Enable Trigger checkbox is selected. Processing Time is the trigger time interval in minutes or seconds. |
Add Configuration | Enables to configure additional Cassandra properties. |
If you have any feedback on Gathr documentation, please email us!