Skip to main content

Column-Level Lineage

For Data Analysts

Column-level lineage tracks exactly how each column flows through your data pipeline, from raw source columns through SQL transformations to semantic layer measures and metrics. This granular tracking enables precise impact analysis and complete data transparency.

Understanding Column Lineage

While model-level lineage shows which models depend on others, column-level lineage reveals the exact column-to-column relationships:

Column-Level Lineage

Track how individual columns flow through transformations.

React Flow mini map
Transformation Types
Direct (unchanged)
Renamed
Derived (calculated)
Aggregated
Legend
🗄️Source
⚙️Staging
📊Mart
🧊Cube
🎯Metric

Column-level lineage answers questions like:

  • Which source columns contribute to this metric?
  • What happens if I change the data type of a column?
  • Where does this measure get its values from?

Lineage Edge Types

Olytix Core categorizes column relationships using edge types that describe the nature of each transformation:

DIRECT

A column passes through unchanged with the same name.

SELECT
order_id, -- DIRECT: stg_orders.order_id → fct_orders.order_id
customer_id -- DIRECT: stg_orders.customer_id → fct_orders.customer_id
FROM {{ ref('stg_orders') }}

RENAMED

A column is renamed using an alias.

SELECT
amount AS order_amount, -- RENAMED: stg_orders.amount → fct_orders.order_amount
created_at AS order_date -- RENAMED: stg_orders.created_at → fct_orders.order_date
FROM {{ ref('stg_orders') }}

DERIVED

A column is calculated from one or more source columns.

SELECT
amount * quantity AS total, -- DERIVED from amount, quantity
amount * (1 - discount_pct) AS net_amount, -- DERIVED from amount, discount_pct
CASE WHEN status = 'paid' THEN amount ELSE 0 END, -- DERIVED from status, amount
COALESCE(shipping_cost, 0) AS shipping -- DERIVED from shipping_cost
FROM {{ ref('stg_orders') }}

AGGREGATED

A column results from an aggregation function.

SELECT
customer_id,
SUM(amount) AS total_spent, -- AGGREGATED from amount
COUNT(*) AS order_count, -- AGGREGATED (no source column)
AVG(amount) AS avg_order_value, -- AGGREGATED from amount
MAX(order_date) AS last_order_date -- AGGREGATED from order_date
FROM {{ ref('stg_orders') }}
GROUP BY customer_id

JOINED

A column originates from a joined table.

SELECT
o.order_id,
o.amount,
c.customer_name, -- JOINED from stg_customers
c.email, -- JOINED from stg_customers
p.product_name -- JOINED from stg_products
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c ON o.customer_id = c.customer_id
LEFT JOIN {{ ref('stg_products') }} p ON o.product_id = p.product_id

Semantic Layer Edge Types

Additional edge types track relationships in the semantic layer:

Edge TypeDescriptionContext
MEASURE_SOURCEColumn flows to a cube measureCube measure definition
DIMENSION_SOURCEColumn flows to a cube dimensionCube dimension definition
METRIC_SOURCEMeasure flows to a metricMetric expression
FILTEREDColumn used in a filterWHERE clause or metric filter

Tracing Lineage Through the Stack

End-to-End Example

# sources/raw.yml
sources:
- name: raw
tables:
- name: orders
columns:
- name: amount
- name: quantity
-- models/staging/stg_orders.sql
SELECT
order_id,
amount, -- DIRECT
quantity -- DIRECT
FROM {{ source('raw', 'orders') }}
-- models/marts/fct_orders.sql
SELECT
order_id,
amount AS order_amount, -- RENAMED
amount * quantity AS order_total -- DERIVED
FROM {{ ref('stg_orders') }}
# cubes/orders.yml
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

measures:
- name: total_revenue
type: sum
sql: order_total # MEASURE_SOURCE from fct_orders.order_total
# metrics/revenue.yml
metrics:
- name: monthly_revenue
expression: orders.total_revenue # METRIC_SOURCE from orders.total_revenue
time_grain: month

Complete lineage path:

raw.orders.amount
└── stg_orders.amount (DIRECT)
└── fct_orders.order_amount (RENAMED)
└── fct_orders.order_total (DERIVED)
└── orders.total_revenue (MEASURE_SOURCE)
└── monthly_revenue (METRIC_SOURCE)

Querying Lineage via API

Get Column Lineage

Retrieve upstream lineage for a specific column:

curl -X GET "http://localhost:8000/api/v1/lineage/model.fct_orders.order_total?direction=upstream"

Response:

{
"root": {
"artifact_id": "model.fct_orders",
"artifact_type": "model_column",
"artifact_name": "fct_orders",
"column_name": "order_total"
},
"nodes": [
{
"artifact_id": "model.fct_orders",
"artifact_type": "model_column",
"artifact_name": "fct_orders",
"column_name": "order_total"
},
{
"artifact_id": "model.stg_orders",
"artifact_type": "model_column",
"artifact_name": "stg_orders",
"column_name": "amount"
},
{
"artifact_id": "model.stg_orders",
"artifact_type": "model_column",
"artifact_name": "stg_orders",
"column_name": "quantity"
},
{
"artifact_id": "source.raw.orders",
"artifact_type": "source_column",
"artifact_name": "raw.orders",
"column_name": "amount"
},
{
"artifact_id": "source.raw.orders",
"artifact_type": "source_column",
"artifact_name": "raw.orders",
"column_name": "quantity"
}
],
"edges": [
{
"source": {"artifact_name": "stg_orders", "column_name": "amount"},
"target": {"artifact_name": "fct_orders", "column_name": "order_total"},
"edge_type": "derived",
"transformation": "amount * quantity"
},
{
"source": {"artifact_name": "stg_orders", "column_name": "quantity"},
"target": {"artifact_name": "fct_orders", "column_name": "order_total"},
"edge_type": "derived",
"transformation": "amount * quantity"
}
],
"computed_at": "2024-01-15T10:30:00Z",
"depth": 3
}

Get Metric Lineage

Trace a metric back to its source columns:

curl -X GET "http://localhost:8000/api/v1/lineage/metric.monthly_revenue?direction=upstream&max_depth=10"

Get Downstream Consumers

Find all artifacts that consume a source column:

curl -X GET "http://localhost:8000/api/v1/lineage/source.raw.orders.amount?direction=downstream"

Query Parameters

ParameterTypeDefaultDescription
directionstringbothupstream, downstream, or both
columnstringnullSpecific column for column-level lineage
max_depthint10Maximum traversal depth (1-50)
max_nodesint1000Maximum nodes to return (1-10000)

CLI Commands

View Column Lineage

# Trace a column upstream to its sources
olytix-core lineage column fct_orders.order_total --upstream

# Trace a column downstream to its consumers
olytix-core lineage column raw.orders.amount --downstream

# Get full lineage (both directions)
olytix-core lineage column stg_orders.amount

View Metric Lineage

# Trace metric to source columns
olytix-core lineage metric monthly_revenue

Export Lineage

# Export as JSON
olytix-core lineage column fct_orders.order_total --format json > lineage.json

# Export as DOT for visualization
olytix-core lineage metric monthly_revenue --format dot > metric_lineage.dot
dot -Tpng metric_lineage.dot > metric_lineage.png

Including Lineage in Query Responses

Request lineage information with your semantic queries:

{
"query": {
"metrics": ["monthly_revenue"],
"dimensions": ["orders.region"],
"time_dimension": {
"dimension": "orders.order_date",
"granularity": "month"
}
},
"options": {
"include_lineage": true
}
}

Response includes lineage metadata:

{
"data": [...],
"meta": {
"lineage": {
"monthly_revenue": {
"type": "metric",
"sources": [
"source.raw.orders.amount",
"source.raw.orders.quantity"
],
"path": [
"raw.orders.amount → stg_orders.amount → fct_orders.order_total → orders.total_revenue → monthly_revenue"
],
"edge_summary": {
"direct": 2,
"derived": 1,
"measure_source": 1,
"metric_source": 1
}
}
}
}
}

Lineage Computation

Olytix Core automatically extracts column-level lineage using SQL parsing:

From SQL Transformations

The extractor analyzes SELECT statements to identify:

  • Direct column references
  • Aliased columns (renames)
  • Expressions and calculations
  • Aggregation functions
  • JOIN conditions
-- Olytix Core parses this SQL and extracts:
SELECT
o.order_id, -- DIRECT
o.amount AS order_amount, -- RENAMED
o.amount * o.quantity AS total, -- DERIVED (from amount, quantity)
SUM(o.amount) OVER (PARTITION BY c.id), -- AGGREGATED
c.name -- JOINED
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c ON o.customer_id = c.id

From Cube Definitions

Measure and dimension SQL expressions are parsed to link to source columns:

measures:
- name: total_revenue
type: sum
sql: order_amount # Links to fct_orders.order_amount

From Metric Definitions

Metric expressions reference cube measures, creating the final lineage link:

metrics:
- name: revenue_per_customer
expression: orders.total_revenue / customers.customer_count

Best Practices

1. Write Parseable SQL

Avoid patterns that obscure lineage:

-- Hard to trace (SELECT *)
SELECT * FROM {{ ref('model') }}

-- Easy to trace (explicit columns)
SELECT
order_id,
amount,
customer_id
FROM {{ ref('model') }}

2. Use Meaningful Aliases

Make renamed columns traceable:

-- Unclear rename
SELECT a AS x FROM orders

-- Clear rename
SELECT order_amount AS revenue FROM orders

3. Document Complex Derivations

Add comments for complex calculations:

SELECT
-- Net revenue = gross amount minus refunds and discounts
gross_amount - refund_amount - (gross_amount * discount_pct) AS net_revenue
FROM {{ ref('stg_orders') }}

4. Keep Transformations Shallow

Prefer direct paths over deeply nested transformations:

Preferred: source → staging → mart → cube
Avoid: source → stg1 → stg2 → int1 → int2 → mart → cube

Next Steps