Neo4j Ingestion Source

In Gathr, Neo4j connector can be added as a channel to help in fetching customers’ and prospects’ data and transform it as needed before storing it in a desired data warehouse to run further analytics.

Data Source Configuration

Configure the data source parameters as explained below.

Fetch From Source/Upload Data File

To design the application, you can either fetch the sample data from the Neo4j source by providing the data source connection details or upload a sample data file in one of the supported formats to see the schema details during the application design phase.

Upload Data File

To design the application, please upload a data file containing sample records in a format supported by Gathr.

The sample data provided for application design should match the data source schema from which data will be fetched during runtime.

If Upload Data File method is selected to design the application, provide the below details.

File Format

Select the format of the sample file depending on the file type.

Gathr-supported file formats for Neo4j data sources are CSV, JSON, TEXT, Parquet and ORC.

For CSV file format, select its corresponding delimiter.

Header Included

Enable this option to read the first row as a header if your Neo4j sample data file is in CSV format.

Upload

Please upload the sample file as per the file format selected above.

👉

Make sure that the file size does not exceed 10 MB.

Fetch From Source

If Fetch From Source method is selected to design the application, then the data source connection details will be used to get sample data.

Continue to configure the data source.

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Neo4j earlier. Or create one as explained in the topic - Neo4j Connection →

Use the Test Connection option to ensure that the connection with the Neo4j channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.

Batch Read

Check the option to enable the batch processing.

Database Name

Name of the database from which data will be read.

Read Mode

Select one of the below option to fetch the records from Neo4j database.

The available options are:

Cypher Query
Node
Relationship

Cypher Query

Options to provide a cypher query.

Example: CREATE (n:Person {fullName: event.name + event.surname}).

Node

You can read nodes by specifying a single label, or multiple labels. Label list can be specified with starting colon. Example, :Person:Customer

Further options are available on selecting Node:

Schema Strategy

Strategy used by the connector in order to compute the schema definition for the dataset. Possible values are String and Sample.

Schema Flatten Limit

Number of records to be used to create the schema.

Relationship

Option to define type of relationship. Specify the mapping detail, the source node, and the target node label as explained below.

Mapping

Check the option to control the result format by the mapping option. The result format can be controlled by the relationship.nodes.map Default is false.

👉

When it is set to false, source and target nodes properties are returned in separate columns prefixed with source or target. (i.e., source.name, target.price). When it is set to true, the source and target nodes properties are returned as Map[String, String] in two columns named source and target.

Source Node

Provide source nodes column. Example: MATCH (source:Person)-[rel: BOUGHT]->(target:Product) RETURN source, rel, target. Here, source:Person.

Target Node

Provide target nodes column. Example: MATCH (source:Person)-[rel: BOUGHT]->(target:Product) RETURN source, rel, target. Here target:Product.

Schema Strategy

Strategy used by the connector in order to compute the schema definition for the dataset. Possible values are String and Sample.

Schema Flatten Limit

Number of records to be used to create the schema.

If you have unchecked the Batch option, the below fields will appear for streaming datasets:

Streaming From

This option is used to trigger the connector from where to send data to the stream.

You can select NOW (Starts reading from the current timestamp.) or All (Sends all the data to the database to the stream before reading the data).

Incremental Read Property

The timestamp property name used for incremental reading.

Partitions

This defines the parallelization level while pulling data from Neo4j.

Add Configuration: Additional properties can be added using this option as key-value pairs.

Schema

Check the populated schema details. For more details, see Schema Preview →

If you have any feedback on Gathr documentation, please email us!

Neo4j Ingestion Source

Data Source Configuration #

Fetch From Source/Upload Data File #

Upload Data File #

File Format #

Header Included #

Upload #

Fetch From Source #

Connection Name #

Batch Read #

Database Name #

Read Mode #

Cypher Query #

Node #

Schema Strategy #

Schema Flatten Limit #

Relationship #

Mapping #

Source Node #

Target Node #

Schema Strategy #

Schema Flatten Limit #

Streaming From #

Incremental Read Property #

Partitions #

Schema #

Data Source Configuration

Fetch From Source/Upload Data File

Upload Data File

File Format

Header Included

Upload

Fetch From Source

Connection Name

Batch Read

Database Name

Read Mode

Cypher Query

Node

Schema Strategy

Schema Flatten Limit

Relationship

Mapping

Source Node

Target Node

Schema Strategy

Schema Flatten Limit

Streaming From

Incremental Read Property

Partitions

Schema