Cassandra Emitter

Cassandra emitter allows you to store data in a Cassandra table.

Cassandra Emitter Configuration

To add a Cassandra emitter into your pipeline, drag the emitter to the canvas and connect it to a Data Source or processor.

👉

If the data source in pipeline has a streaming component, then the emitter will show four additional properties: Checkpoint Storage Location, Checkpoint Connections, Checkpoint Directory, and Time-Based checkpoint.

The configuration settings are as follows:

Field	Description
Connection Name	All Cassandra connections will be listed here. Select a connection for connecting to Advance Kafka.
KeySpace	Cassandra keyspace name. If keyspace name does not exist in Cassandra, then it will create new keyspace.
Output Fields	Output messages fields.
Key Columns	A single/compound primary key consists of the partition key and one or more additional columns that determines clustering.
Table Name Expression	Cassandra table name. If the table name does not exist on the keyspace then it will create a new table. 👉 The user can create tables dynamically based on field name provided in table name expression.
Consistency Level	Consistency level refers to how up-to-date and synchronized a row of Cassandra data is on all its replicas. Consistency levels are as follows: ONE: Only a single replica must respond. TWO: Two replicas must respond. THREE: Three replicas must respond. QUORUM: A majority (n/2 + 1) of the replicas must respond. ALL: All of the replicas must respond. LOCAL_QUORUM: A majority of the replicas in the local data center (whichever data center the coordinator is in) must respond. EACH_QUORUM: A majority of the replicas in each data center must respond. LOCAL_ONE: Only a single replica must respond. In a multi-data center cluster, this also guarantees that read requests are not sent to replicas in a remote data center.
Replication Strategy	A replication strategy specifies the implementation class for determining the nodes where replicas are placed. Possible strategies are SimpleStrategy and NetworkTopologyStrategy.
Replication Factor	Replication factor used to make additional copies of data.
Enable TTL	Select the checkbox to enable TTL (Time to Live) for records to persist for that time duration.
TTL Value	It will appear only when Enable TTL checkbox is selected. Provide TTL value in seconds.
Checkpoint Storage Location	Select the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint Connections	Select the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory	It is the path where Spark Application stores the checkpointing data. For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself. For S3, enter an absolute path like: S3://BucketName/checkpointingDir
Time-Based Check Point	Select checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Batch Size	Number of records to be picked for inserting into Cassandra.
Output Mode	Output mode to be used while writing the data to Streaming emitter. Select the output mode from the given three options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.
Save Mode	Save Mode is used to specify the expected behavior of saving data to a data sink. ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the data. Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the data and to not change the existing data. This is similar to a CREATE TABLE IF NOT EXISTS in SQL.
Enable Trigger	Trigger defines how frequently a streaming query should be executed.
Processing Time	It will appear only when Enable Trigger checkbox is selected. Processing Time is the trigger time interval in minutes or seconds.
Add Configuration	Enables to configure additional Cassandra properties.

👉

Append output mode should only be used if an Aggregation Processor with watermarking is used in the data pipeline.

If you have any feedback on Gathr documentation, please email us!

Cassandra Emitter

Cassandra Emitter Configuration #

Cassandra Emitter Configuration