Vertica ETL Target
In this article
Vertica emitter supports Oracle, Postgres, MYSQL, MSSQL, DB2 connections.
You can configure and connect above mentioned DB-engines with JDBC. It allows you to emit data into DB2 and other sources into your data pipeline in batches after configuring JDBC channel.
For using DB2, create a successful DB2 Connection.
Vertica Emitter Configuration
To add a Vertica emitter to your pipeline, drag it onto the canvas and connect it to a Data Source or processor.
The configuration settings of the Vertica emitter are as follows:
Connection Name
Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Vertica earlier. Or create one as explained in the topic - Vertica Connection β
Use the Test Connection option to ensure that the connection with the Amazon S3 channel is established successfully.
A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.
Message Name
The name of the message configuration which will act as metadata for the actual data.
Table Name
Existing table name of the specified database.
Is Batch Enabled
Enable parameter to batch multiple messages and improve write performances.
Batch Size
Batch Size, which determines how many rows to insert per round trip. This can help the performance on JDBC drivers. This option applies only to writing. It defaults to 1000.
Connection Retries
Number of retries for component connection. Possible values are -1, 0 or positive number. -1 denotes infinite retries.
If Routing Required =true, then:
Routing Policy - A json defining the custom routing policy.
Example: {"1":{"company":{βGoogleβ:20.0,"Apple":80.0}}}
Here 1 is the timestamp after which custom routing policy will be active, ‘company’ is the field name and the value ‘Google’ takes 20% shards and value ‘Apple’ takes 80% shards.
Save Mode
Save Mode is used to specify the expected behavior of saving data to a data sink.
ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown.
Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.
Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.
Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.
This is similar to a CREATE TABLE IF NOT EXISTS in SQL.
Ignore Missing Values
Ignore or persist empty or null values of message fields in sink.
Delay Between Connection Retries
Defines the retry delay intervals for component connection in milliseconds.
Enable TTL
When selected, data will be discarded to TTL exchange specified.
Checkpoint Storage Location
Select the checkpointing storage location. Available options are HDFS, S3, and EFS.
Checkpoint Connections
Select the connection. Connections are listed corresponding to the selected storage location.
Checkpoint Directory
It is the path where Spark Application stores the checkpointing data.
For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself.
For S3, enter an absolute path like: S3://BucketName/checkpointingDir
Output Mode
Output mode to be used while writing the data to Streaming sink.
Append: Output Mode in which only the new rows in the streaming data will be written to the sink.
Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates.
Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.
Enable Trigger
Trigger defines how frequently a streaming query should be executed.
Schema Results
Table Column Name
Name of the column populated from the selected Table.
Mapping Value
Map a corresponding value to the column.
Database Data Type
Data type of the Mapped Value.
Ignore All
Select the Ignore All check box to ignore all the Schema Results or select a checkbox adjacent to the column to ignore that column from the Schema Results.
Use Ignore All or selected fields while pushing data to emitter.
This will add that field as the part of partition fields while creating the table.
Auto Fill
Auto Fill automatically populates and map all incoming schema fields with the fetched table columns. The left side shows the table columns and right side shows the incoming schema fields.
If same field, as of table column, not found in incoming schema then the first field will be selected by default.
Download Mapping
It downloads the mappings of schema fields and table columns in a file.
Upload Mapping
Uploading the mapping file automatically populates the table columns and schema fields.
Add Configuration: Additional properties can be added using this option as key-value pairs.
Post Action
To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Post-Actions β
Notes
Optionally, enter notes in the Notes β tab and save the configuration.
If you have any feedback on Gathr documentation, please email us!