Register Entities

Register Entities allows you to register custom components i.e. custom parsers, data sources and processors to be used in the pipelines.

There are following types of entities:

Entity

Description

Register Component

Upload a customized jar to create a customized component that can be used in data pipelines.

Functions

A rich library of pre-defined functions and user defined functions.

Variables

Use variables in your pipelines at runtime as per the scope.

Calendar

Create multiple holiday calendar which can be then used in Workflow.


Each entity is explained below.

Register Components

Use Register Component to register a custom component (Channel and Processor) by uploading a customized jar. Those custom components can be used in data pipelines.

Register Components tab comes under Register Entities side bar option.

Download a sample jar from Data Pipelines page, customize it as per your requirement, and upload the same on Register Components page.

Custom Code Implementation

Gathr allows you implement your custom code in the platform to extend functionalities for:

Channel: To read from any source.

Processor: To perform any operation on data-in-motion.

Custom code implementation allows importing custom components and versioning.

You can download a Maven based project that contains all the necessary Gathr dependencies for writing custom code and sample code for reference.

Pre-requisites for custom code development

1. JDK 1.7 or higher

2. Apache Maven 3.x

3. Eclipse or any other IDE

Steps for Custom Code ImplementationstepsforCustomCode

Build Custom Code

Provide all the dependencies required for the custom components in pom.xml available in the project.

• Build project using mvn clean install.

• Use jar-with-dependencies.jar for component registration.

Register custom coderegistercustomcomponent

RegisterComponent

The list of custom components is displayed on the page shown below and the properties are described below:

Field

Description

Components

The icon of the component is displayed in this column, which symbolizes a Data Source or a Processor.

Name

Provide name for Custom Component.

Config

Config link of the component. You can add configuration to a custom component or upload a jar.

Engine

The supported engine which is Spark.

Scope

The component can be used for a Project or across Workspace.

Note: The user can define the scope of the Component by selecting either Project or Workspace. If user selects workspace then, the created Component can be used across the Workspace. However, if the user selects Project as scope, then the Component will be visible only in the specific project.

Actions

Add Config (+)

Upload Jar

Delete

Owner

If the custom component was created by a Superuser or workspace user.

Version

The version number of the custom component.


Perform following operation on uploaded custom components.

l Change scope of custom components (i.e. Global/Local)

l Change icon of custom components.

l Add extra configuration properties.

l Update or delete registered custom components.

Version Support (Versioning) in component registration

Register multiple versions of a registered component and use any version in your pipeline.RegisteredComponent

Note:

- As shown in the above image, the user can view the details of listing page of the created Component including details such as Components, Owner, Parent Project (the project in which the Component is registered), Scope (Workspace/Project), Owner, so on and so forth.

- If you have used any registered component in the pipeline, make sure that all the registered components (ones registered with single jar) should be of the same version. If you have registered a component with a fully qualified name, then that component cannot be registered with another jar in the same workspace.

- If same jar is uploaded having same FQN, a new version of that component will get created.

Functions

Functions enables you to enrich an incoming message with additional data that is not provided by the source.FunctionsListingpage02

Field

Description

Function Name

Specify the name of Function.

Arguments

The argument specification are enlisted.

Scope

The function can be used for a Project or across Workspace.

Note: The user can define the scope of the Function by selecting either Project or Workspace. If user selects workspace then, the created Function can be used across the Workspace. However, if the user selects Project as scope, then the Function will be visible only in the specific project.

Parent Project

Parent project of the function being registered.

Owner

Name of the user who created the function.

Date Created

Creation date of the function.

Date Modified

Last modified date of the function.

Actions

Option to view more details about the function like:

Description, Parameters, Returns, Throws and Example.

Also, option to delete the registered function


System Defined Functions

Gathr provides a rich library of system-defined functions as explained in the Functions section.

Variables

Allows you to use variables in your pipelines at runtime as per the scope.

To add a variable, click on Create New Variable (+) icon and provide details as explained below.CreateVariable

Field

Description

Name

Provide a name to the variable

Value

Value of assigned to the variable (it can be an expression)

Data Type

Select the Data Type of the variable. The options are:

Number

Decimal

String

Scope

Select the Scope of the variable. Following are the types of scope:

Project: If the user selects Project, then the scope of this variable will be within the project.

Workspace: If the user selects Workspace, then the scope of variable will be within all the topologies of the workspace.


For example, if you create the following variables: Name, Salary and Average.

Then by calling the following code, you will get all the variables in the varMap in its implementation class.

Map<String, ScopeVariable> varMap = (Map<String, ScopeVariable>) config­Map.get(svMap);

If you want to use the Name variable that you have created by calling the following code you will get all the details of the scope variables.

The variable object has all the details of the variable Name, Value, Datatype and Scope.

ScopeVariable variable = varMap.get(Name);

String value = variable.getValue();

Variable listing page (shown below)VariablesListing

Note: As shown in the above image, the user can view the details of listing page of the created Variable including details such as Name, Initial value, Data Type, Parent Project (the project in which the Variable is created), Scope (Workspace/Project), so on and so forth.

Scope Variable

You can now add Scope Variable so that you can use these variables to reuse and update them as and when needed on pipeline and pipeline components.

Scope Variable Support is added for below components with their respective location where the scope variable will be populated with the help of @.

Cobol (Data Source) --> copybookPath --> dataPath

HDFS (Data Source) --> file path

Hive (Data Source) --> Query

JDBC (Data Source) -- > Query

GCS (Batch and Streaming) (Data Source)--> File Path

File Writer (Emitter)--> File Path

Formats supported are:

@{Pipeline.filepath} = /user/hdfs

@{Workspace.filepath} = /user/hdfs

@{Global.filepath}/JSON/demo.json = /user/hdfs/JSON/demo.json

@{Pipeline.filepath + '/JSON/demo.json'} = /user/hdfs/JSON/demo.json

@{Workspace.filepath + “/JSON/demo.json”} = /user/hdfs/JSON/demo.json

@{Global.lastdecimal + 4} // will add number = 14.0

Calendars

The user can create holiday calendars from Register Entities < Calendar< Calendar listing page. There will be a + icon to create the calendar.Calandar

Entity

Description

Name

Name of the calendar.

Scope

Select Project or Workspace, which defines the scope of the calendar.

Note: The user can define the scope of the Calendar by selecting either Project or Workspace. If user selects workspace then, the set Calendar can be used across the Workspace. However, if the user selects Project as scope, then the set Calendar will be visible only in the specific project.

Timezone

Select the timezone for your calendar.

Date(s)

Select the date(s) for your calender to be marked as holiday.

Description

User can add a description about the calendar.

Upload

Upload a text file (.txt) that contains date(s) in the format of MM-DD-YYYY. In case the file has multiple dates, each entry should be in a new line.


Note: These calender can be used in the Workflow.

Calendar Listing (shown below):Calandar-Listing

Note: As shown in the above image, the user can view the details of listing page of the Calendar including details such as Name, Dates, Parent Project (the project in which the Calendar is created), Scope (Workspace/Project), so on and so forth.