Vertica ETL Target

Vertica emitter supports Oracle, Postgres, MYSQL, MSSQL, DB2 connections.

You can configure and connect above mentioned DB-engines with JDBC. It allows you to emit data into DB2 and other sources into your data pipeline in batches after configuring JDBC channel.

For using DB2, create a successful DB2 Connection.

Vertica Emitter Configuration

To add a Vertica emitter to your pipeline, drag it onto the canvas and connect it to a Data Source or processor.

The configuration settings of the Vertica emitter are as follows:

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Vertica earlier. Or create one as explained in the topic - Vertica Connection β†’

Use the Test Connection option to ensure that the connection with the Amazon S3 channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.

Message Name

The name of the message configuration which will act as metadata for the actual data.

Table Name

Existing table name of the specified database.

Is Batch Enabled

Enable parameter to batch multiple messages and improve write performances.

Batch Size

Batch Size, which determines how many rows to insert per round trip. This can help the performance on JDBC drivers. This option applies only to writing. It defaults to 1000.

Connection Retries

Number of retries for component connection. Possible values are -1, 0 or positive number. -1 denotes infinite retries.

If Routing Required =true, then:

Routing Policy - A json defining the custom routing policy.

Example: {"1":{"company":{β€œGoogle”:20.0,"Apple":80.0}}}

Here 1 is the timestamp after which custom routing policy will be active, ‘company’ is the field name and the value ‘Google’ takes 20% shards and value ‘Apple’ takes 80% shards.

Save Mode

Save Mode is used to specify the expected behavior of saving data to a data sink.

  • ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown.

  • Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.

  • Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.

  • Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data.

    This is similar to a CREATE TABLE IF NOT EXISTS in SQL.

Ignore Missing Values

Ignore or persist empty or null values of message fields in sink.

Delay Between Connection Retries

Defines the retry delay intervals for component connection in milliseconds.

Enable TTL

When selected, data will be discarded to TTL exchange specified.

Checkpoint Storage Location

Select the checkpointing storage location. Available options are HDFS, S3, and EFS.

Checkpoint Connections

Select the connection. Connections are listed corresponding to the selected storage location.

Checkpoint Directory

It is the path where Spark Application stores the checkpointing data.

For HDFS and EFS, enter the relative path like /user/hadoop/, checkpointingDir system will add suitable prefix by itself.

For S3, enter an absolute path like: S3://BucketName/checkpointingDir

Output Mode

Output mode to be used while writing the data to Streaming sink.

  • Append: Output Mode in which only the new rows in the streaming data will be written to the sink.

  • Complete Mode: Output Mode in which all the rows in the streaming data will be written to the sink every time there are some updates.

  • Update Mode: Output Mode in which only the rows that were updated in the streaming data will be written to the sink every time there are some updates.

Enable Trigger

Trigger defines how frequently a streaming query should be executed.

Schema Results

Table Column Name

Name of the column populated from the selected Table.

Mapping Value

Map a corresponding value to the column.

Database Data Type

Data type of the Mapped Value.

Ignore All

Select the Ignore All check box to ignore all the Schema Results or select a checkbox adjacent to the column to ignore that column from the Schema Results.

Use Ignore All or selected fields while pushing data to emitter.

This will add that field as the part of partition fields while creating the table.

Auto Fill

Auto Fill automatically populates and map all incoming schema fields with the fetched table columns. The left side shows the table columns and right side shows the incoming schema fields.

If same field, as of table column, not found in incoming schema then the first field will be selected by default.

Download Mapping

It downloads the mappings of schema fields and table columns in a file.

Upload Mapping

Uploading the mapping file automatically populates the table columns and schema fields.


Add Configuration: Additional properties can be added using this option as key-value pairs.


Post Action

To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Post-Actions β†’


Notes

Optionally, enter notes in the Notes β†’ tab and save the configuration.

Top