🔄

Data Pipelines

Definition updated on November 2023

What are data pipelines?

A data pipeline is a method for moving unprocessed data from several data sources to an analysis-ready data store, like a data lake or data warehouse. In most cases, data is processed before it enters a data repository. This includes data transformations that guarantee proper data integration and standardization, such as filtering, masking, and aggregations. When a relational database is the dataset's final destination, this is very crucial. In order to update current data with new data, this form of data repository needs alignment, or the matching of data columns and types.