Neo4j

Neo4j Emitter enables you to write data on configured database of Neo4j service for both batch and streaming input dataset. To add Neo4j emitter to your pipeline, select the emitter onto the canvas, connect it to a data source or processor, and click on it to configure.

Field	Description
Connection Name	All the Neo4j connections will be listed here. Select a connection for connecting to Neo4j.
Database Name	Database name to which data will be added in Neo4j emitter.
Write Mode	Option to write the data on database emitter. The available options are: Cypher Query, Nodes, and Relationships.

Upon selecting Cypher Query option as write mode, the below fields are reflected:


Cypher Query	Persist the entire dataset by using the provided cypher query. Example: CREATE (n:Person {fullName: event.name + event.surname}).

Upon selecting Nodes option as write mode, the below fields are reflected:


Nodes	Persist the entire dataset as nodes. The nodes are sent to Neo4j in a batch of rows defined in the batch size field.
Node Keys	The key:value pairs, where the key is the DataFrame column name and the value is the node property name.
Save Mode	Save Mode is used to specify the expected behavior of saving data to a data sink. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.

Nodes

Persist the entire dataset as nodes. The nodes are sent to Neo4j in a batch of rows defined in the batch size field.

Node Keys

The key:value pairs, where the key is the DataFrame column name and the value is the node property name.

Save Mode

Save Mode is used to specify the expected behavior of saving data to a data sink.

Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data.

Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.

Upon selecting Relationship option as write mode, the below fields are reflected:


Relationship	Gathr provide option to define type of relationship. For example: You can, in a given dataset move the columns around to create source and target nodes, eventually creating the specified relationships between them.
Relationship Save Strategy	Option to save strategy to be used.
Relationship Properties	Map used as keys for specifying the relationship properties. Used only if the Relationship Save Strategy keys type option is selected.
Source Labels	Colon separated list of the labels to attach to the node.
Source Save Mode	Source Node save mode.
Source Node Keys	Map used as keys for matching the source node.
Source Node Properties	Map used as keys for specifying the source properties. Used only if the Relationship Save Strategy keys type option is selected.
Target Labels	Colon-separated list of labels that identify the target node.
Target Save Mode	Target Node save mode.
Target Node Keys	Map used as keys for matching the target node.
Target Node Properties	Map used as keys for specifying the target properties. Used only if the Relationship Save Strategy keys type option is selected.
Save Mode	Save Mode is used to specify the expected behavior of saving data to a data sink. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data.

The remaining fields are as below:


Output Fields	Fields in the message that needs to be a part of output data can be selected from the drop-down list.
Batch Size	The number of rows sent to Neo4j as batch. Default is 5000.
Priority	Option to define the execution order for the emitter.

In case of streaming input dataset, the below options will be available:


Checkpoint Storage Location	Select the check pointing storage location. Available options are HDFS, S3 and EFS.
Checkpoint Connections	Select connection. The connections are listed corresponding to the selected storage location.
Override Credential	Override credentials for user specific actions. Upon checking this option provide the Username i.e., the name of user through which the Hadoop services are running.
Check Point Directory	It is the HDFS path where the Spark application stores the check point data.
Time-based Check Point	Select checkbox to enable time-based checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis.
Enable Trigger	Trigger defines how frequently a streaming query should be executed.
Trigger Type	Available options are: One-Time Micro Batch (A trigger that processes only one batch of data in a streaming query and then terminates the query) and Fixed-Interval Micro Batches (A trigger policy that runs a query periodically based on an interval in processing time). Provide the Processing Time if you select this option.

If you have any feedback on Gathr documentation, please email us!