Data Lineage

Data lineage refers to the tracking and visualization of how data flows, transforms, and is consumed throughout its lifecycle.

Actions Available

There are various actions that can be performed on each tab of the view data asset, in addition to the listing page.

common_headers

Edit Data Asset Name: Modify the name of the data asset to better suit your needs.

Additional Options: Access a range of actions including deletion, utilization in Ingestion or ETL Applications, marking as a favorite, and configuring the data asset.

Start Profiling: Initiate data profiling to gain insights into your data’s characteristics and quality.

Back to Data Assets Listing: Return to the list of all data assets for an overview of your data

common_options


Lineage Representation

The data lineage tab illustrates the association of a data asset in Gathr applications.

Below example shows the lineage as follows:

LIneage2

Lineage Diagram Description:

  1. Alaska_Flights_Data: This data asset serves as a source of flight data from Alaska.

  2. Florida_Flights_Data: Another source data asset providing flight information, this time from Florida.

  3. Merge_Flights_Data (ETL Application): These two data assets, Alaska_Flights_Data and Florida_Flights_Data, are ingested into the ETL application named Merge_Flights_Data. Here, they are combined to create a new data asset.

  4. Combined_Flights_Data: This is the result of the ETL process within Merge_Flights_Data. It represents a merged data asset that combines flight data from Alaska and Florida, with some data transformations applied.

  5. Process_Flights_Data (Pipeline): The Combined_Flights_Data is then used as input in the data processing pipeline named Process_Flights_Data. Within this pipeline, further transformations, analyses, or actions are applied to the data.

  6. Delayed_Flights_Data: As a result of the processing carried out in the Process_Flights_Data pipeline, a new data asset named Delayed_Flights_Data is created. This data asset contains flight information with specific details related to delays.

In this lineage diagram, you can trace the flow of data from the source data assets through the ETL application, leading to the creation of the final data asset, Delayed_Flights_Data.

It illustrates the path and dependencies of these data assets within your data processing workflow.

  • An association is defined, if the data asset schema and rules are used in channel. This helps to use the same entities into multiple pipeline channels, as Use Existing Data Asset.

  • In case of emitter, only the schema part of the data asset is associated.

  • The life-cycle of the data asset is shown under the lineage section.

    It represents flow of the data asset in the system with applications.

Expand-Collapse Lineage

  • Initially a basic lineage is shown. You have the option to expand the data asset or application lineage to get more parent-child associations and flows.

  • The data asset to application arrow signify that the data asset is used in application as a channel.

  • Application to data asset arrow represents that the data asset is saved in emitter of the application.

View Lineage for a Specific Version

Use the version drop-down on the Lineage page to select a specific version of a data asset, revealing the corresponding lineage details for that particular version of the data asset.

Top