Data Lineage

Definition updated on November 2023

What is data lineage?

Data lineage is the process of following the flow of data across time, giving insight into the data's origin, changes over time, and final destination in the data pipeline. Data lineage technologies offer a record of data throughout its lifecycle, including source data and any data transformations used during ETL or ELT operations. Users can utilize data lineage to verify that their data is coming from a reliable source, has through the necessary transformations, and has been placed into the intended location. Data lineage is essential when reliable information is used to support strategic decisions. Data verification becomes nearly impossible, or at the very least extremely expensive and time-consuming, if data operations are not properly tracked. By enabling users to explore upstream and downstream, from source to destination, to find abnormalities and fix them, data lineage focuses on evaluating data accuracy and consistency.

