Neo4j

Neo4j Emitter enables you to write data on configured database of Neo4j service for both batch and streaming input dataset. To add Neo4j emitter to your pipeline, select the emitter onto the canvas, connect it to a data source or processor, and click on it to configure.

FieldDescription
Connection NameAll the Neo4j connections will be listed here. Select a connection for connecting to Neo4j.
Database NameDatabase name to which data will be added in Neo4j emitter.
Write ModeOption to write the data on database emitter. The available options are: Cypher Query, Nodes, and Relationships.

Upon selecting Cypher Query option as write mode, the below fields are reflected:

Cypher QueryPersist the entire dataset by using the provided cypher query. Example: CREATE (n:Person {fullName: event.name + event.surname}).

Upon selecting Nodes option as write mode, the below fields are reflected:

NodesPersist the entire dataset as nodes. The nodes are sent to Neo4j in a batch of rows defined in the batch size field.
Node KeysThe key:value pairs, where the key is the DataFrame column name and the value is the node property name.
Save ModeSave Mode is used to specify the expected behavior of saving data to a data sink.

Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.

Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.

Upon selecting Relationship option as write mode, the below fields are reflected:

RelationshipGathr provide option to define type of relationship. For example: You can, in a given dataset move the columns around to create source and target nodes, eventually creating the specified relationships between them.
Relationship Save StrategyOption to save strategy to be used.
Relationship PropertiesMap used as keys for specifying the relationship properties. Used only if the Relationship Save Strategy keys type option is selected.
Source LabelsColon separated list of the labels to attach to the node.
Source Save ModeSource Node save mode.
Source Node KeysMap used as keys for matching the source node.
Source Node PropertiesMap used as keys for specifying the source properties. Used only if the Relationship Save Strategy keys type option is selected.
Target LabelsColon-separated list of labels that identify the target node.
Target Save ModeTarget Node save mode.
Target Node KeysMap used as keys for matching the target node.
Target Node PropertiesMap used as keys for specifying the target properties. Used only if the Relationship Save Strategy keys type option is selected.
Save ModeSave Mode is used to specify the expected behavior of saving data to a data sink.

Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.

Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.

The remaining fields are as below:

Output FieldsFields in the message that needs to be a part of output data can be selected from the drop-down list.
Batch SizeThe number of rows sent to Neo4j as batch. Default is 5000.
PriorityOption to define the execution order for the emitter.

In case of streaming input dataset, the below options will be available:

Checkpoint Storage LocationSelect the check pointing storage location. Available options are HDFS, S3 and EFS.
Checkpoint ConnectionsSelect connection. The connections are listed corresponding to the selected storage location.
Override CredentialOverride credentials for user specific actions. Upon checking this option provide the Username i.e., the name of user through which the Hadoop services are running.
Check Point DirectoryIt is the HDFS path where the Spark application stores the check point data.
Time-based Check PointSelect checkbox to enable time-based checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Enable TriggerTrigger defines how frequently a streaming query should be executed.
Trigger TypeAvailable options are: One-Time Micro Batch (A trigger that processes only one batch of data in a streaming query and then terminates the query) and Fixed-Interval Micro Batches (A trigger policy that runs a query periodically based on an interval in processing time). Provide the Processing Time if you select this option.
Top