JDBC Emitter
In this article
JDBC Emitter allows you to push data to relational databases like MySQL, PostgreSQL, Oracle DB and MS-SQL.
JDBC emitter also enables you to configure data on DB2 database using JDBC emitter for both batch and stream.
It is enriched with lookup functionality for the DB2 database so that you can enrich fields with external data read from DB2. Select a DB2 connection while configuring JDBC emitter.
JDBC Emitter Configuration
To add a JDBC emitter to your pipeline, drag the JDBC emitter onto the canvas and connect it to a Data Source or processor.
The configuration settings of the JDBC emitter are as follows:
Field | Description |
---|---|
Save as Dataset | Save the JDBC emitter configuration as a Dataset. |
Dataset Name | Provide a unique name to the Dataset. |
Connection Name | All JDBC connections will be listed here. Select a connection for connecting to JDBC. |
Message Name | Name of the message configuration that acts as metadata. |
Schema Name | Existing database Schema Names whose tables are fetched. (for MSSQL, DB2 and POSTGRES) |
Table Name | Existing tablename of the specified database. |
Is Batch Enable | Enable parameter to batch multiple messages and improve write performances. |
Batch Size | Batch Size, which determines how many rows to insert per round trip. This can help the performance on JDBC drivers. This option applies only to writing. It defaults to 1000. |
Connection Retries | Number of retries for component connection. |
Delay Between Connection Retries | Defines the retry delay intervals for component connection in millis. |
Save Mode | Save Mode is used to specify the expected behavior of saving data to a data sink. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data. Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data. This is similar to a CREATE TABLE IF NOT EXISTS in SQL. |
Check Point Directory | It is the HDFS Path where the Spark application stores the checkpoint data. |
Output Mode | Output Mode is used to what data will be written to a streaming sink when there is new data available. Select the output mode from the given three options: Append: Output Mode in which only the new rows in the streaming data will be written to the sink Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates. Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates. |
Enable Trigger | Trigger defines how frequently a streaming query should be executed. |
Processing Time | It will appear only when Enable Trigger checkbox is selected. Processing Time is the trigger time interval in minutes or seconds. |
ADD CONFIGURATION | Enables to configure custom properties. Provide spark-mssql-connector configuration for JDBC Emitter. Provide the below configurations:
tableLock: schemaCheckEnabled: If the above configuration is not provided then the emitter will use plain JDBC for writing data. |
Schema Results: | Map the values coming in pipeline with any table columns name. |
Table Column Name | The columns from the selected table are populated here. |
Mapping Value | Enter a mapping value to the corresponding column. |
Database Data Type | The data type of the value, i.e., String, Int, Text, and so on. |
Ignore All | Use Ignore All or selected fields while pushing data to emitter. |
Sample Gathr Expression Evaluator:
(_replaceMissingValues(<Col Name>, '', 'replaceNullAndEmpty', string))
Click on the NEXT button.
Enter the notes in the space provided.
Click DONE for saving the configuration.
If you have any feedback on Gathr documentation, please email us!