Save Data Assets
Save the data asset as per the details explained below:
Data Asset Name
A unique name should be provided for the Data Asset.
Description
Option to add notes about the Data Asset.
Data Asset Category
Customized category can be added to the Data Asset.
Start typing the category that you want to add and a list of the existing categories in the system will appear in the drop-down.
Select categories from the drop-down list to apply to this data asset.
You can manage the existing categories from the Categories List option.
Categories List
Click on the categories list option to manage the existing categories.
Here, you can add, edit, or delete an existing category.
To know more about how to add, edit or delete a data asset category, see Categories List.
Data Asset Tags
Option to add customized tags for the Data Asset that is being created.
Data Asset Deployment
Option to choose the Data Asset deployment on either Gathr cluster or EMR cluster associated with the registered compute environment.
The prerequisite to utilizing registered clusters for running Data Asset is to first register a cloud account from the User Settings > Compute Setup → tab.
To understand the steps for registering a cloud account, see Compute Setup →
Cluster Size
Option to choose one amongst Free Tier, Extra Small, Small, Medium, Large, or custom cluster sizes on which the Data Asset will be deployed.
A cluster size should be chosen based on the computing needs.
The credit points (cp) utilization for each cluster is explained below:
Extra Small: 1 credit/min
Small: 2 credits/min
Medium: 4 credits/min
Large: 8 credits/min
GPU - Powered by NVIDIA RAPIDS: 10 credits/min
Also, a custom cluster can only be utilized with a registered compute environment that is available in a Business Plan.
Utilize micro cluster if available
Micro cluster option is available for Extra Small Cluster sizes. It uses available free slots on Gathr Compute to optimize the Data Asset submission for small scale applications.
Registered Clusters Configuration Fields
AWS Region: Option to select the preferred region associated with the compute environment.
AWS Account: Option to select the registered AWS account ID associated with the compute environment.
DNS Name: Option to select the DNS name linked to the VPC endpoint for Gathr.
EMR Cluster Config: A saved EMR cluster configuration is to be selected out of the list, or it can be created with the Add New Config for EMR Cluster option.
For more details on how to save EMR cluster configurations in Gathr, see EMR Cluster Configuration →
The Data Asset will be deployed on the EMR cluster using the custom configuration that is selected from this field.
Extra Spark Submit Options
The configuration provided here will be additionally submitted to spark while running the job. The configuration should strictly be provided in the format given below:
–conf
Schedule Profiling
Automatic scheduling and unscheduling of profile runs can be managed from the Schedule Profiling.
You can schedule a frequency to automatically trigger a profile run for the data asset at defined intervals.
Save and Explore Rules
Use this option to save the data asset and navigate to the Schema and Rules tab of the newly created data asset.
To know more about the data asset schema and rules, click here.
Save and Exit
Use this option to save the data asset and navigate to the Data Assets Listing page.
If you have any feedback on Gathr documentation, please email us!