Neo4j ETL Source
In this article
In Gathr, Neo4j connector can be added as a channel to help in fetching customers’ and prospects’ data and transform it as needed before storing it in a desired data warehouse to run further analytics.
Schema Type
See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.
After providing schema type details, the next step is to configure the data source.
Data Source Configuration
Configure the data source parameters as explained below.
Connection Name
Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Neo4j earlier. Or create one as explained in the topic - Neo4j Connection →
Use the Test Connection option to ensure that the connection with the Neo4j channel is established successfully.
A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.
Batch Read
Check the option to enable the batch processing.
Database Name
Name of the database from which data will be read.
Read Mode
Select one of the below option to fetch the records from Neo4j database.
The available options are:
Cypher Query
Node
Relationship
Cypher Query
Options to provide a cypher query.
Example: CREATE (n:Person {fullName: event.name + event.surname}).
Node
You can read nodes by specifying a single label, or multiple labels. Label list can be specified with starting colon. Example, :Person:Customer
Further options are available on selecting Node:
Schema Strategy
Strategy used by the connector in order to compute the schema definition for the dataset. Possible values are String and Sample.
Schema Flatten Limit
Number of records to be used to create the schema.
Relationship
Option to define type of relationship. Specify the mapping detail, the source node, and the target node label as explained below.
Mapping
Check the option to control the result format by the mapping option. The result format can be controlled by the relationship.nodes.map Default is false.
Source Node
Provide source nodes column. Example: MATCH (source:Person)-[rel: BOUGHT]->(target:Product) RETURN source, rel, target. Here, source:Person.
Target Node
Provide target nodes column. Example: MATCH (source:Person)-[rel: BOUGHT]->(target:Product) RETURN source, rel, target. Here target:Product.
Schema Strategy
Strategy used by the connector in order to compute the schema definition for the dataset. Possible values are String and Sample.
Schema Flatten Limit
Number of records to be used to create the schema.
If you have unchecked the Batch option, the below fields will appear for streaming datasets:
Streaming From
This option is used to trigger the connector from where to send data to the stream.
You can select NOW (Starts reading from the current timestamp.) or All (Sends all the data to the database to the stream before reading the data).
Incremental Read Property
The timestamp property name used for incremental reading.
Partitions
This defines the parallelization level while pulling data from Neo4j.
Add Configuration: Additional properties can be added using this option as key-value pairs.
Detect Schema
Check the populated schema details. For more details, see Schema Preview →
Pre Action
To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →.
Notes
Optionally, enter notes in the Notes → tab and save the configuration.
If you have any feedback on Gathr documentation, please email us!