Neo4j
Neo4j Emitter enables you to write data on configured database of Neo4j service for both batch and streaming input dataset. To add Neo4j emitter to your pipeline, select the emitter onto the canvas, connect it to a data source or processor, and click on it to configure.
Field | Description |
---|---|
Connection Name | All the Neo4j connections will be listed here. Select a connection for connecting to Neo4j. |
Database Name | Database name to which data will be added in Neo4j emitter. |
Write Mode | Option to write the data on database emitter. The available options are: Cypher Query, Nodes, and Relationships. |
Upon selecting Cypher Query option as write mode, the below fields are reflected:
Cypher Query | Persist the entire dataset by using the provided cypher query. Example: CREATE (n:Person {fullName: event.name + event.surname}). |
Upon selecting Nodes option as write mode, the below fields are reflected:
Nodes | Persist the entire dataset as nodes. The nodes are sent to Neo4j in a batch of rows defined in the batch size field. |
Node Keys | The key:value pairs, where the key is the DataFrame column name and the value is the node property name. |
Save Mode | Save Mode is used to specify the expected behavior of saving data to a data sink. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data. |
Upon selecting Relationship option as write mode, the below fields are reflected:
Relationship | Gathr provide option to define type of relationship. For example: You can, in a given dataset move the columns around to create source and target nodes, eventually creating the specified relationships between them. |
Relationship Save Strategy | Option to save strategy to be used. |
Relationship Properties | Map used as keys for specifying the relationship properties. Used only if the Relationship Save Strategy keys type option is selected. |
Source Labels | Colon separated list of the labels to attach to the node. |
Source Save Mode | Source Node save mode. |
Source Node Keys | Map used as keys for matching the source node. |
Source Node Properties | Map used as keys for specifying the source properties. Used only if the Relationship Save Strategy keys type option is selected. |
Target Labels | Colon-separated list of labels that identify the target node. |
Target Save Mode | Target Node save mode. |
Target Node Keys | Map used as keys for matching the target node. |
Target Node Properties | Map used as keys for specifying the target properties. Used only if the Relationship Save Strategy keys type option is selected. |
Save Mode | Save Mode is used to specify the expected behavior of saving data to a data sink. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data. |
The remaining fields are as below:
Output Fields | Fields in the message that needs to be a part of output data can be selected from the drop-down list. |
Batch Size | The number of rows sent to Neo4j as batch. Default is 5000. |
Priority | Option to define the execution order for the emitter. |
In case of streaming input dataset, the below options will be available:
Checkpoint Storage Location | Select the check pointing storage location. Available options are HDFS, S3 and EFS. |
Checkpoint Connections | Select connection. The connections are listed corresponding to the selected storage location. |
Override Credential | Override credentials for user specific actions. Upon checking this option provide the Username i.e., the name of user through which the Hadoop services are running. |
Check Point Directory | It is the HDFS path where the Spark application stores the check point data. |
Time-based Check Point | Select checkbox to enable time-based checkpoint on each pipeline run i.e. in each pipeline run above provided checkpoint location will be appended with current time in millis. |
Enable Trigger | Trigger defines how frequently a streaming query should be executed. |
Trigger Type | Available options are: One-Time Micro Batch (A trigger that processes only one batch of data in a streaming query and then terminates the query) and Fixed-Interval Micro Batches (A trigger policy that runs a query periodically based on an interval in processing time). Provide the Processing Time if you select this option. |
If you have any feedback on Gathr documentation, please email us!