Save Data Assets

Save the data asset as per the details explained below:

Data Asset Name

A unique name should be provided for the Data Asset.


Description

Option to add notes about the Data Asset.


Data Asset Category

Customized category can be added to the Data Asset.

Start typing the category that you want to add and a list of the existing categories in the system will appear in the drop-down.

Select categories from the drop-down list to apply to this data asset.

You can manage the existing categories from the Categories List option.

Categories List

Click on the categories list option to manage the existing categories.

data_assets_save_categories_list

Here, you can add, edit, or delete an existing category.

data_assets_edit_categories_list

To know more about how to add, edit or delete a data asset category, see Categories List.


Data Asset Tags

Option to add customized tags for the Data Asset that is being created.


Data Asset Deployment

Option to choose the Data Asset deployment on either Gathr cluster or EMR cluster associated with the registered compute environment.

The prerequisite to utilizing registered clusters for running Data Asset is to first register a cloud account from the User Settings > Compute Setup → tab.

To understand the steps for registering a cloud account, see Compute Setup →


Cluster Size

Option to choose one amongst Free Tier, Extra Small, Small, Medium, Large, or custom cluster sizes on which the Data Asset will be deployed.

A cluster size should be chosen based on the computing needs.

The credit points (cp) utilization for each cluster is explained below:

  • Extra Small: 1 credit/min

  • Small: 2 credits/min

  • Medium: 4 credits/min

  • Large: 8 credits/min

  • GPU - Powered by NVIDIA RAPIDS: 10 credits/min

Utilize micro cluster if available

Micro cluster option is available for Extra Small Cluster sizes. It uses available free slots on Gathr Compute to optimize the Data Asset submission for small scale applications.


Registered Clusters Configuration Fields

AWS Region: Option to select the preferred region associated with the compute environment.

AWS Account: Option to select the registered AWS account ID associated with the compute environment.

DNS Name: Option to select the DNS name linked to the VPC endpoint for Gathr.

EMR Cluster Config: A saved EMR cluster configuration is to be selected out of the list, or it can be created with the Add New Config for EMR Cluster option.

For more details on how to save EMR cluster configurations in Gathr, see EMR Cluster Configuration →

The Data Asset will be deployed on the EMR cluster using the custom configuration that is selected from this field.


Extra Spark Submit Options

The configuration provided here will be additionally submitted to spark while running the job. The configuration should strictly be provided in the format given below:

–conf =


Schedule Profiling

Automatic scheduling and unscheduling of profile runs can be managed from the Schedule Profiling.

You can schedule a frequency to automatically trigger a profile run for the data asset at defined intervals.


Save and Explore Rules

Use this option to save the data asset and navigate to the Schema and Rules tab of the newly created data asset.

To know more about the data asset schema and rules, click here.


Save and Exit

Use this option to save the data asset and navigate to the Data Assets Listing page.

Top