Create Data Assets

A data asset is a valuable and organized collection of data, typically including structured information such as databases and tables. It contains a schema and a set of rules that can be applied to that schema. You can create Data Assets in Gathr and use them in ETL and Ingestion applications, enabling the execution of applications with a predefined schema and a pre-configured set of rules.

The ETL and Ingestion applications have many emitters that support saving data assets directly from their configuration pages.

Create Data Asset

Data Assets are helpful in getting insights on the data and getting an analysis on each attribute of the Data Asset. A Data Asset created in Gathr application can further be used in any desired pipelines.

To create a Data Asset, navigate to Data Assets page and click the plus icon or click on the ADD NEW Data Asset option.

CreateDataset1

Select Data Source

The sources available to save as Data Assets will get displayed.

To know more about data asset sources that are supported in Gathr, see Data Asset Supported Sources →


Configure Data Source

Configure the applicable source details as per the details explained in the relevant topic:

ComponentReference Topic
ADLSADLS Data Asset Source Configuration →
Amazon S3Amazon S3 Data Asset Source Configuration →
BigQueryBigQuery Data Asset Source Configuration →
ElasticsearchElasticsearch Data Asset Source Configuration →
File UploadFile Upload Data Asset Source Configuration →
GCSGCS Data Asset Source Configuration →
MSSQLMSSQL Data Asset Source Configuration →
MySQLMySQL Data Asset Source Configuration →
OracleOracle Data Asset Source Configuration →
PostgreSQLPostgreSQL Data Asset Source Configuration →
RedshiftRedshift Data Asset Source Configuration →
SampleSample Data Asset Source Configuration →
SFTPSFTP Data Asset Source Configuration →
SnowflakeSnowflake Data Asset Source Configuration →

Preview Data Source

After the source is configured, the data from the source is represented as a Schema. This process is called detect schema. The schema is then divided in Columns with Column Name, Column Alias, Data Type and Sample Values.

CreateDataset3

Column Alias and Data Type are editable on this page. Optional description can also be added for columns.

To know more about data source schema preview, see Schema Preview →


Save Data Asset

Save the Data Asset as explained in the topic Save Data Asset →.

Top