Navigating Data Assets
This topic helps you understand the data assets Listing Page.
Recent Data Assets
Here’s a quick overview of the key features available on the recent data assets listing.
Create Data Assets: The “Create Data Asset” button is at the top and it allows you to add a new data asset seamlessly.
Refresh Listings: The “Refresh” button instantly updates the listings and ensure you have the latest data assets information.
Search Field: Efficiently locate data assets through the search field. Standard search allows filtering by source type, connections, category, or status. Activate Natural Language Search for an intuitive approach. Type queries in plain English to find datasets, for example, with specific conditions, content, execution timelines, scores, or containing particular columns.
Recent Data Assets: On the left, find the five most recent data assets added or updated, offering a snapshot of the latest insights.
Top Categories: The right-hand side showcases top categories for easy filtering, helping you find the data assets relevant to your needs. All data assets without a category will be classified under uncategorized.
View All Data Assets: Towards the bottom, explore all available data assets at once. Ideal for a comprehensive search or broad exploration.
Data Assets Listing Page
Navigating Advanced Data Asset Management in Gathr
This section explains the features available to streamline the discovery, analysis, and management of your data assets.
Use the back button to return to the previous view, such as the Recent Page. From there, you can access the data asset detailed listing page again using the View All option.
Advanced Search
At the top of the page, you’ll find the Advanced Search feature.
This enables you to perform detailed searches using multiple criteria:
Source Type: Filter data assets based on their source type.
Connections: Narrow down data assets by their associated connections.
Category: Filter data assets by the categories they are assigned to.
Status: Refine search results by specifying the data asset’s status.
Natural Language Search
Find the data assets you need quickly and efficiently with this powerful search feature.
Easily search and retrieve data assets using natural language queries.
Simply type your query in plain English and get relevant results based on data asset names, descriptions, tags, and other metadata.
Examples:
Search for datasets that have duplicate data change greater than 90%.
Find datasets that contain flight or airport information.
Discover datasets that were run in the current month.
Retrieve datasets with a score greater than 90%.
Search for datasets that have a column named ‘country’.
Sorting Options
You can sort the displayed data assets in ascending or descending order using the following criteria:
Name: Alphabetically arrange the data assets by their names.
Updated Date: Sort the data assets based on the most recent update.
Created Date: Arrange the data assets by their creation dates.
Save User Preferences
Customize your experience by saving your preferences.
Adjust sorting choices, default filters, and more to tailor the page to your needs.
Simply make your selections, click “Save Preferences,” and your chosen settings will be remembered.
Refresh Listings
To ensure you’re always working with the latest information, hit the “Refresh” button.
Instantly update the data assets listings without leaving the page.
Create Data Asset
The “Create Data Asset” button at the top allows you to effortlessly add new data assets.
Filtering Options
All Data Assets: This default view displays all available data assets.
My Data Assets: Filter to see only the data assets you own.
Favorites: Display the data assets marked as favorites for quick access.
Data Asset Listing Entry
Each data asset listing entry includes:
Data Source Icon: Represents the source of the data asset.
Scheduled Date Info: Displays if the data asset is scheduled, along with the next scheduled date.
Data Asset Name: The name of the data asset followed by its state for quick identification.
Data Asset Status: The status of the data asset, such as Draft, Published, or Deprecated.
Profiling Status: Provides an overview of the data asset’s profiling status.
Version Details: Indicates the latest version of the data asset.
Data Quality Score: Shows the overall data quality score.
Columns and Records: Displays the number of columns and records in the data asset.
Last Updated: Shows when the data asset was last updated along with the user details.
Last Profile Run: Displays information about the last profile run.
Description, Tags, Categories: Any added description, tags, or categories are displayed.
View Summary: Access a comprehensive summary of the data asset’s details, data quality, and more.
Run Profile: Run profiling on data assets where you’re allowed to initiate profiling.
Only the data asset owners can initiate profiling, and this action is limited to the specific project where the data asset was originally created or saved.Other Actions: Additional actions depending on your role and permissions. These actions are explained further in detail.
Data Asset Listing Entry Actions
For each data asset, the ellipsis button offers actions on data assets based on your role and permissions.
View Details
To see the details of any data asset, click on the View option.
The tabs shown in the image below will get displayed for the viewed data asset:
Information inside view option:
Delete Data Asset
Data asset owners can delete assets within their project.
Use in an Ingestion Application
Utilize the data asset as a data source in a new Ingestion application.
After clicking the Use in Ingestion Application, a new Ingestion application draft will open and the chosen data asset will appear as a data source.
Complete its configuration flow and build the Ingestion application using the required components around it.
A few important configuration parameters when using a data asset in an Ingestion application:
Always Use Latest Data Asset Version
To enable this option, simply check the corresponding box in your application configuration.
Enabling this option means that your Ingestion application will always check for the latest available data asset version at runtime.
But, if the data asset’s future versions gets updated with any schema changes, user will need to manually do the schema matching to avoid any impact to the application.
Select Data Asset Version
With this option, you have the flexibility to choose a specific version of the data asset to consume in your pipeline.
Regularly review and update your selected version to maintain relevance with the latest available data asset version.
Use in an ETL Application
Utilize the data asset as a data source in a new ETL application.
After clicking the Use in an ETL Application, the ETL canvas will open and the chosen data asset will appear as a data source.
Complete its configuration flow and build the ETL application using the required components around it.
A few important configuration parameters when using a data asset in an ETL application:
Always Use Latest Data Asset Version
To enable this option, simply check the corresponding box in your application configuration.
Enabling this option means that your ETL application will always check for the latest available data asset version at runtime.
But, if the data asset’s future versions gets updated with any schema changes, user will need to manually do the schema matching to avoid any impact to the application.
Select Data Asset Version
With this option, you have the flexibility to choose a specific version of the data asset to consume in your pipeline.
Regularly review and update your selected version to maintain relevance with the latest available data asset version.
Mark or Unmark as Favorite
Mark frequently used data assets as favorites for quick access.
You can also remove data assets from favorites using the same option.
Edit Configuration
Edit an existing data asset’s configurations using this option.
This action can be performed directly from the Data Asset’s listing page or from any tab after opening it.
The schema of a published data asset cannot be changed.
For published data assets, only additions of columns or updates to the number of records are allowed via the edit configuration option.
If there’s a schema mismatch during validation of the updated data asset, an error will prevent further processing until the issue is resolved.
After editing configurations, validate the schema. If the validation is successful, proceed to update the data asset’s configurations.
The data asset will get updated and the recent data assets page will be shown.
Profile
Schedule a profiling job or change the cluster for the data asset.
You can schedule a data asset to automatically profile it as per the scheduling options.
You can set the scheduling frequency to minutes, hourly, daily, weekly, monthly, or yearly.
To know about the cluster options available for the data asset profiling, see Cluster Size →
View Summary
For a brief overview of a data asset, click on the “View Summary” button available for each asset.
The data asset for which the summary is viewed is easily identifiable from other entries.
The summary includes:
Data Asset Details: Name, assigned state, connection details, and version information.
Data Quality Indicator: A visual indicator of the data asset’s quality.
Metadata: Number of rows, columns, size, and source/connection details.
Context: Description, categories, and associated tags.
Applications created using data asset: A filter displaying the total number of applications created using this data asset.
On clicking the view option, the list of applications using the data asset will appear.
The view option will be disabled if the application has been deleted and is currently in the Recycle Bin.
If you have any feedback on Gathr documentation, please email us!