Configure Ingestion Application

On the Configure page, provide the details for each field as described below:

Application Name: A unique name will be generated for the ingestion application. You can edit the name as per your choice. Data Ingestion applications should start with an alphabet and may include alphanumeric characters and special characters such as !@$-;:()-_?=~/*<>’ in their names.

Description: Option to add supporting description about the ingestion application that is being created.

Tags: Option to add customized tags for the ingestion application that is being created.

Application Deployment: Option to choose the application deployment on either Gathr cluster or EMR cluster associated with the registered compute environment.

Provide the inputs for Application Deployment while creating or editing any application as described below.


Common fields for Gathr Clusters and Registered Clusters:

The prerequisite to utilizing registered clusters for running applications is to establish a virtual private connection from the User Settings > Compute Setup → tab.

Cluster Size: Option to choose one amongst Free Tier, Extra Small, Small, Medium, Large, or custom cluster sizes on which the applications will be deployed.

A cluster size should be chosen based on the computing needs.

The credit points (cp) utilization for each cluster is explained below:

  • Extra Small: 1 credit/min

  • Small: 2 credits/min

  • Medium: 4 credits/min

  • Large: 8 credits/min

  • GPU - Powered by NVIDIA RAPIDS: 10 credits/min

Utilize micro cluster if available: Micro cluster option is available for Extra Small Cluster sizes.

It uses available free slots on Gathr Compute to optimize the application submission for small scale applications.

Store Raw Data in Error Logs: Enable this option if you prefer to get the raw data in the error logs.


Additional configuration fields for Registered Clusters:

AWS Region: Option to select the preferred region associated with the compute environment.

AWS Account: Option to select the registered AWS account ID associated with the compute environment.

DNS Name: Option to select the DNS name linked to the VPC endpoint for Gathr.

EMR Cluster Config: A saved EMR cluster configuration is to be selected out of the list, or it can be created with the Add New Config for EMR Cluster option.

For more details on how to save EMR cluster configurations in Gathr, see EMR Cluster Configuration →

The application will be deployed on the EMR cluster using the custom configuration that is selected from this field.

Top