Datasets Introduction

Dataset is a saved entity that contains the schema and rules.

Rules can be configured in a dataset to transform its schema or results.

Channels in pipelines can re-use dataset schema and rules, this transforms data from channels with corresponding transitions.

Dataset is used to analyze data by seeing its generated profile results (mainly used by Data Scientists) based on actual data existing in the source defined for it.

The result which is the statistical analysis of the columns of a Dataset is a Profile.

A history of profile results is maintained under Profile History.

You can view the association or flow of datasets between pipelines in the system. This is Dataset Lineage. You can expand the lineage view to any level (parent/child)

Also, Dataset Versions can be created based on schema and rule changes. The schema, rules, and lineage are then listed version-wise.

Datasets can be created externally, from Channel and Emitter.

Top