JDBC Emitter

JDBC Emitter allows you to push data to relational databases like MySQL, PostgreSQL, Oracle DB and MS-SQL.

JDBC emitter also enables you to configure data on DB2 database using JDBC emitter for both batch and stream.

It is enriched with lookup functionality for the DB2 database so that you can enrich fields with external data read from DB2. Select a DB2 connection while configuring JDBC emitter.

JDBC Emitter Configuration

To add a JDBC emitter to your pipeline, drag the JDBC emitter onto the canvas and connect it to a Data Source or processor.

The configuration settings of the JDBC emitter are as follows:

FieldDescription
Save as DatasetSave the JDBC emitter configuration as a Dataset.
Dataset NameProvide a unique name to the Dataset.
Connection NameAll JDBC connections will be listed here. Select a connection for connecting to JDBC.
Message NameName of the message configuration that acts as metadata.
Schema NameExisting database Schema Names whose tables are fetched. (for MSSQL, DB2 and POSTGRES)
Table NameExisting tablename of the specified database.
Is Batch EnableEnable parameter to batch multiple messages and improve write performances.
Batch SizeBatch Size, which determines how many rows to insert per round trip. This can help the performance on JDBC drivers. This option applies only to writing. It defaults to 1000.
Connection RetriesNumber of retries for component connection.
Delay Between Connection RetriesDefines the retry delay intervals for component connection in millis.
Save Mode

Save Mode is used to specify the expected behavior of saving data to a data sink.

Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.

Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.

Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.

This is similar to a CREATE TABLE IF NOT EXISTS in SQL.

Check Point DirectoryIt is the HDFS Path where the Spark application stores the checkpoint data.
Output Mode

Output Mode is used to what data will be written to a streaming sink when there is new data available.

Select the output mode from the given three options:

Append: Output Mode in which only the new rows in the streaming data will be written to the sink

Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates.

Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.

Enable TriggerTrigger defines how frequently a streaming query should be executed.
Processing Time

It will appear only when Enable Trigger checkbox is selected.

Processing Time is the trigger time interval in minutes or seconds.

ADD CONFIGURATION

Enables to configure custom properties.

Provide spark-mssql-connector configuration for JDBC Emitter.

Provide the below configurations:


sqlConnectorName: com.microsoft.sqlserver.jdbc.spark

tableLock: false

schemaCheckEnabled: false

Schema Results:Map the values coming in pipeline with any table columns name.
Table Column NameThe columns from the selected table are populated here.
Mapping ValueEnter a mapping value to the corresponding column.
Database Data TypeThe data type of the value, i.e., String, Int, Text, and so on.
Ignore AllUse Ignore All or selected fields while pushing data to emitter.

Sample Gathr Expression Evaluator:

(_replaceMissingValues(<Col Name>, '', 'replaceNullAndEmpty', string))

Click on the NEXT button.

Enter the notes in the space provided.

Click DONE for saving the configuration.

Top