Cassandra Emitter

Cassandra emitter allows you to store data in a Cassandra table.

Cassandra Emitter Configuration

To add a Cassandra emitter into your pipeline, drag the emitter to the canvas and connect it to a Data Source or processor.

The configuration settings are as follows:

FieldDescription
Connection NameAll Cassandra connections will be listed here. Select a connection for connecting to Advance Kafka.
KeySpaceCassandra keyspace name. If keyspace name does not exist in Cassandra, then it will create new keyspace.
Output FieldsOutput messages fields.
Key ColumnsA single/compound primary key consists of the partition key and one or more additional columns that determines clustering.
Table Name Expression

Cassandra table name. If the table name does not exist on the keyspace then it will create a new table.

Consistency Level

Consistency level refers to how up-to-date and synchronized a row of Cassandra data is on all its replicas.

Consistency levels are as follows:

ONE: Only a single replica must respond.

TWO: Two replicas must respond.

THREE: Three replicas must respond.

QUORUM: A majority (n/2 + 1) of the replicas must respond.

ALL: All of the replicas must respond.

LOCAL_QUORUM: A majority of the replicas in the local data center (whichever data center the coordinator is in) must respond.

EACH_QUORUM: A majority of the replicas in each data center must respond.

LOCAL_ONE: Only a single replica must respond. In a multi-data center cluster, this also guarantees that read requests are not sent to replicas in a remote data center.

Replication StrategyA replication strategy specifies the implementation class for determining the nodes where replicas are placed. Possible strategies are SimpleStrategy and NetworkTopologyStrategy.
Replication FactorReplication factor used to make additional copies of data.
Enable TTLSelect the checkbox to enable TTL (Time to Live) for records to persist for that time duration.
TTL Value

It will appear only when Enable TTL checkbox is selected.

Provide TTL value in seconds.

Checkpoint Storage LocationSelect the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint ConnectionsSelect the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory

It is the path where Spark Application stores the checkpointing data.

For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself.

For S3, enter an absolute path like: S3://BucketName/checkpointingDir

Time-Based Check PointSelect checkbox to enable timebased checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Batch SizeNumber of records to be picked for inserting into Cassandra.
Output Mode

Output mode to be used while writing the data to Streaming emitter. Select the output mode from the given three options:

Append: Output Mode in which only the new rows in the streaming data will be written to the sink

Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates

Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.

Save Mode

Save Mode is used to specify the expected behavior of saving data to a data sink.

ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown.

Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.

Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the data.

Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the data and to not change the existing data.

This is similar to a CREATE TABLE IF NOT EXISTS in SQL.

Enable TriggerTrigger defines how frequently a streaming query should be executed.
Processing TimeIt will appear only when Enable Trigger checkbox is selected. Processing Time is the trigger time interval in minutes or seconds.
Add ConfigurationEnables to configure additional Cassandra properties.
Top