This scenario illustrates the power of comprehensive data lineage - with the ability to trace data at column level, within a system, and across systems, an organization can quickly identify and resolve data issues, improving overall data quality and trust for effective Data Change Management.
The scenario involves a Data Analyst named Alex in a large financial institution.
Alex is tasked with creating a Report to analyse customer transactions. For this, he uses a column named
'transaction_amount' from a central data warehouse.
As he starts the planned change investigation , Alex notices some irregularities in the data - there are negative values in the 'transaction_amount' column, which doesn't make sense for this context.
This is where Alex's journey through the different layers of data lineage begins:
- End-to-End Column Lineage: Alex starts by investigating the end-to-end lineage of the 'transaction_amount' column. He can see the various transformations applied to this data point, as well as where it's used in downstream reports. He discovers that the column is derived from two columns in a source system - 'transaction_type' (credit or debit) and 'transaction_value'. A transformation is applied to convert debit transaction values to negative.
- Inner-System Lineage: Alex then looks at the inner-system lineage within the source system. He notices that the 'transaction_type' column is derived from several fields, including a 'transaction_code'. A particular transaction code is used to identify debit transactions, and an error in mapping this code might be causing the problem.
- Cross-System Lineage: Finally, Alex uses the cross-system lineage to find all systems feeding into the 'transaction_type' column. He discovers an upstream system where the 'transaction_code' originates. It turns out, there's been a recent system update which changed the 'transaction_code' values for debit transactions.
Armed with this information, Alex collaborates with the data engineering team to correct the error in mapping the new 'transaction_code' and ensures that the transformation logic applied to the 'transaction_amount' column is accurate. As a result, the data quality issue is resolved, and Alex can confidently proceed with his report, trusting the data he's using.
Comments
0 comments
Please sign in to leave a comment.