Create a Workflow

To create a workflow, do as follows:

Click on Create Application button.

Workflows_1

You will be redirected to Workflow design canvas.

Workflow_Canvas

This page has nodes to create and execute the workflow. They are explained below:

Nodes

To define a workflow, four nodes are available:

  • Control Nodes

  • Pipelines

  • Actions

  • Workflow

Add Control node with one or multiple pipelines, with actions applied on it. Save your workflow. Once the Workflows are saved, you can also concatenate Workflows from the Workflow tab.


Control Nodes

Two types of control nodes are available:

Workflow_Control_Nodes

Dummy: This node controls the flow of the workflow, based on trigger condition or group tasks defined in a workflow.

Start: This node is mandatory for defining a workflow. This node is used to represent logical start of a workflow. You can only use one start node in a workflow.


Application Nodes

All the Ingestion and ETL Applications created by you will be reflected here.

Workflow_Existing_Pipelines


Action Nodes

Action nodes are available to provide functionality of the following actions to a workflow.

Following are the Actions available under Action node.

Workflow_Actions


Prerequisites

For the database operators to run successfully in Gathr:

Gathr IP format:

ec2-3-13-24-114.<aws-region>.compute.amazonaws.com


BIGQUERY Operator

This operator is used to execute StoredProcedure(SP) in BigQuery database.

Following are the properties under the configuration:

Connection ID: Create a new BigQuery connection or select a connection ID from the drop-down which is defined in Airflow to connect with the required BigQuery database.

To know more about how to create BigQuery connection, see BigQuery Connection →

Query: SQL queries that will be used to perform operations should be provided.

Retries: Number of times workflow tries to run this task in case of failure.

Trigger Rule: Rules to define trigger conditions of this node.

SLA: Provide SLA related details (if required).


dbtCore Operator

Git Connection: Select the Git connection.

Note: Git Connection can be created from Gathr’s main menu > Connections page.

Git Branch: Select the branch of the dbt project.

Warehouse: Select the warehouse or database where the models will be executed.

  • Postgres

  • BigQuery

  • Databricks

  • Redshift

  • Snowflake

Connection: Select the Warehouse connection if already created. Or a connection can be created from Gathr’s main menu > Connections page.

Profile Configuration: Profile configuration is auto populated with connection details & default values. Provide any optional profile parameters required for the model execution.

Profile Name: Specify the name of the profile to be executed.

Note: Profile Name in the project’s dbt_project.yml file should match with the Profile Name specified here.

Target: Specify the target of project.

DBT Commands: Select or add the dbt commands to be executed, optional command arguments can also be added. Example: –debug, –warn-error, –fail-fast. The commands will be executed sequentially.

Environment Variables: Specify the environment variable key (name) and its corresponding value.

Dependencies: Enter the list of Python dependencies required for your DBT Core project.

These libraries will be installed automatically.

Example:

urllib3==1.26.16
numpy
pandas=2.1.0

Retries: Number of times workflow tries to run this task in case of failure.

Trigger Rule: Rules to define trigger conditions of this node.

SLA: Provide SLA related details (if required).


HTTP Operator

The HTTP Operator in Gathr Workflows facilitates communication with an HTTP system by sending requests to specific endpoints.

This operator proves valuable for executing actions, making it an essential component in designing dynamic and interactive workflows.

Configuration Properties:

Connection ID: Select connection ID from the drop-down which is defined in airflow to connect with required HTTP.

Connections are the service identifiers. A connection ID can be selected from the list if you have created and saved connection details for HTTP operator earlier. Or create one as explained in the topic - HTTP Operator Connection ID

End Point: Specify the endpoint on the HTTP system to trigger the desired action.

Request Data: For POST or PUT requests, provide data in JSON format. For GET requests, use a dictionary of key/value string pairs.

  • GET Example:

    https://10.xx.xx.238:xxx/Gathr/datafabric/func/list?projectName=XXX_1000068&projectVersion=1&projectDisplayName=xxxx&pageView=pipeline
    
  • POST Example:

    {"connectionName": "JDBC_MYSQL_AUTO", "connectionId": "xxxxx", "componentType": "jdbc", "tenantId": "xxx", "connectionJson": {"databaseType": "mysql", "driver": "", "connectionURL": "", "driverType": "", "databaseName": "xxx", "host": "xxx", "port": "xxx", "username": "xxxx", "password": "xxxxx"}}
    

Header: Include HTTP headers in JSON format to be added to the request.

Retries: Define the number of attempts the workflow should make to run the task in case of failure.

Trigger Rule: Set a rule to determine the trigger condition for this node.

SLA: Optionally provide SLA-related details if required for task execution.


HTTP Operator Connection ID

Connection ID: Assign a unique and identifiable name for the connection ID.

Connection Type: Select either “Direct” for a direct connection or “SSH” for a secure connection via SSH.

SSH Key: If using SSH, provide the SSH key for authentication.

SSH Host: Specify the destination host address when using SSH for connection.

SSH User: Input the SSH username for authentication.

Host: Enter the host address for a direct connection.

Port: Define the port number for the connection.

Schema: Provide the schema information if applicable.

Login: Input the login or username for authentication.

Password: Enter the password for authentication.

Extra: Additional parameters or information relevant to the connection.


MSSQL Operator

This operator use to execute SQL statements on a Microsoft SQL database.

Connection ID: Create a new JDBC connection or select a connection ID from the drop-down which is defined in Airflow to connect with the required MSSQL database.

To know more about how to create JDBC connection, see JDBC Connection →

Query: SQL queries used to perform operations.

Retries: Number of times workflow tries to run this task in case of failure.

Trigger Rule: Rules to define trigger conditions of this node.

SLA: Provide SLA related details (if required).


MySQL Operator

This operator is used to execute SQL statement on a MySQL database.

Connection ID: Create a new JDBC connection or select a connection ID from the drop-down which is defined in Airflow to connect with the required MySQL database.


To know more about how to create JDBC connection, see JDBC Connection →

If the MySQL database is SSL enabled, then you can enable the SSL option in Gathr.

This option determines whether or with what priority a secure SSL TCP/IP connection will be negotiated with the server. There are five modes:

PREFERRED: First try an SSL connection; if that fails, try a non-SSL connection.

DISABLED: Only try a non-SSL connection.

REQUIRED: Only try an SSL connection. If a root CA file is present, verify the certificate in the same way as if verify-ca was specified.

VERIFY_CA: Only try an SSL connection, and verify that the server certificate is issued by a trusted certificate authority (CA).

VERIFY_IDENTITY: Only try an SSL connection, verify that the server certificate is issued by a trusted CA and that the requested server host name matches that in the certificate.


Continue with the operator’s configuration:

Query: SQL queries used to perform operations.

Retries: Number of times workflow tries to run this task in case of failure.

Trigger Rule: Rules to define trigger conditions of this node.

SLA: Provide SLA related details (if required).


Pause Operator

The Pause Operator is used to Pause the current workflow.

Following are the properties under the configuration:

Retries: Number of times workflow tries to run this task in case of failure.

Trigger Rule: Rules to define trigger conditions of this node.

SLA: Provide SLA related details (if required).


Pipeline Operator

Pipeline operator is used to run selected pipeline. You can select pipelines that needs to run. Function of this operator is same as that of a Pipeline Node.

You have to set the following configurations for this operator.

Pipeline To Run: Write custom python code here that would be execute by workflow.

Retries: Number of times workflow tries to run this task in case of failure.

Trigger Rule: Rules to define trigger conditions of this node.

SLA: Provide SLA related details (if required).


POSTGRES Operator

This operator is used to execute SQL statements on a PostgreSQL database.

Connection ID: Create a new JDBC connection or select a connection ID from the drop-down which is defined in Airflow to connect with the required Postgres database.

To know more about how to create JDBC connection, see JDBC Connection →

If the Postgres database is SSL enabled, then you can enable the SSL option in Gathr.

This option determines whether or with what priority a secure SSL TCP/IP connection will be negotiated with the server. There are six modes:

allow: First try a non-SSL connection; if that fails, try an SSL connection.

prefer: First try an SSL connection; if that fails, try a non-SSL connection.

disable: Only try a non-SSL connection.

require: Only try an SSL connection. If a root CA file is present, verify the certificate in the same way as if verify-ca was specified.

verify-ca: Only try an SSL connection, and verify that the server certificate is issued by a trusted certificate authority (CA).

verify-full: Only try an SSL connection, verify that the server certificate is issued by a trusted CA and that the requested server host name matches that in the certificate.

Continue with the operator’s configuration:

Query: SQL queries used to perform operations.

Retries: Number of times workflow tries to run this task in case of failure.

Trigger Rule: Rules to define trigger condition of this node.

SLA: Provide SLA related details (if required).


SQL Sensor

SQLSensor runs the SQL statement until first cell is in (0,’0’,’’).

It runs the SQL statement after each poke interval until Time-Out interval.

Connection ID: Select Connection ID from drop down which is defined in airflow to connect with required SQL.

Time Out Interval: Maximum time for which the sensor will check the given location (in seconds) once triggered.

Poke Interval: Time interval, that sensor will wait between each tries (in seconds).

Query: SQL queries used to perform operations.

Retries: Number of times the workflow tries to run this task in case of failure.

Trigger Rule: Rules to define trigger conditions of this node.

SLA: Provide SLA related details (if required).


Time Delta Sensor

Waits for a given amount of time before succeeding. User needs to provide configurations.

Time Out Interval: Maximum time till when sensor will wait.

Retries: Number of times workflow tries to run this task in case of failure.

Trigger Rule: Rules to define trigger conditions of this node.

SLA: Provide SLA related details (if required).


Connection ID

You can create connections against the Connection ID field. A pop-up will appear (Create Connection). Provide the information required to create a connection in this tile.

Workflow_Connection_ID

The following types of connections can be created:

  • BigQuery

  • HTTP

  • MSSQL

  • MySQL

  • Postgres


Workflow Nodes

All the Workflows that are created in Gathr for the logged in user are listed in the Workflow node.

You can add workflow as an operator inside a workflow (similar to pipeline). This workflow will act as a sub-workflow and a separate instance of workflow and will be executed as sub-workflow.

Workflow_Existing_Workflows

Top