Column-Level Lineage
Column-level lineage tracks exactly how each column flows through your data pipeline, from raw source columns through SQL transformations to semantic layer measures and metrics. This granular tracking enables precise impact analysis and complete data transparency.
Understanding Column Lineage
While model-level lineage shows which models depend on others, column-level lineage reveals the exact column-to-column relationships:
Column-Level Lineage
Track how individual columns flow through transformations.
Column-level lineage answers questions like:
- Which source columns contribute to this metric?
- What happens if I change the data type of a column?
- Where does this measure get its values from?
Lineage Edge Types
Olytix Core categorizes column relationships using edge types that describe the nature of each transformation:
DIRECT
A column passes through unchanged with the same name.
SELECT
order_id, -- DIRECT: stg_orders.order_id → fct_orders.order_id
customer_id -- DIRECT: stg_orders.customer_id → fct_orders.customer_id
FROM {{ ref('stg_orders') }}
RENAMED
A column is renamed using an alias.
SELECT
amount AS order_amount, -- RENAMED: stg_orders.amount → fct_orders.order_amount
created_at AS order_date -- RENAMED: stg_orders.created_at → fct_orders.order_date
FROM {{ ref('stg_orders') }}
DERIVED
A column is calculated from one or more source columns.
SELECT
amount * quantity AS total, -- DERIVED from amount, quantity
amount * (1 - discount_pct) AS net_amount, -- DERIVED from amount, discount_pct
CASE WHEN status = 'paid' THEN amount ELSE 0 END, -- DERIVED from status, amount
COALESCE(shipping_cost, 0) AS shipping -- DERIVED from shipping_cost
FROM {{ ref('stg_orders') }}
AGGREGATED
A column results from an aggregation function.
SELECT
customer_id,
SUM(amount) AS total_spent, -- AGGREGATED from amount
COUNT(*) AS order_count, -- AGGREGATED (no source column)
AVG(amount) AS avg_order_value, -- AGGREGATED from amount
MAX(order_date) AS last_order_date -- AGGREGATED from order_date
FROM {{ ref('stg_orders') }}
GROUP BY customer_id
JOINED
A column originates from a joined table.
SELECT
o.order_id,
o.amount,
c.customer_name, -- JOINED from stg_customers
c.email, -- JOINED from stg_customers
p.product_name -- JOINED from stg_products
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c ON o.customer_id = c.customer_id
LEFT JOIN {{ ref('stg_products') }} p ON o.product_id = p.product_id
Semantic Layer Edge Types
Additional edge types track relationships in the semantic layer:
| Edge Type | Description | Context |
|---|---|---|
MEASURE_SOURCE | Column flows to a cube measure | Cube measure definition |
DIMENSION_SOURCE | Column flows to a cube dimension | Cube dimension definition |
METRIC_SOURCE | Measure flows to a metric | Metric expression |
FILTERED | Column used in a filter | WHERE clause or metric filter |
Tracing Lineage Through the Stack
End-to-End Example
# sources/raw.yml
sources:
- name: raw
tables:
- name: orders
columns:
- name: amount
- name: quantity
-- models/staging/stg_orders.sql
SELECT
order_id,
amount, -- DIRECT
quantity -- DIRECT
FROM {{ source('raw', 'orders') }}
-- models/marts/fct_orders.sql
SELECT
order_id,
amount AS order_amount, -- RENAMED
amount * quantity AS order_total -- DERIVED
FROM {{ ref('stg_orders') }}
# cubes/orders.yml
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
measures:
- name: total_revenue
type: sum
sql: order_total # MEASURE_SOURCE from fct_orders.order_total
# metrics/revenue.yml
metrics:
- name: monthly_revenue
expression: orders.total_revenue # METRIC_SOURCE from orders.total_revenue
time_grain: month
Complete lineage path:
raw.orders.amount
└── stg_orders.amount (DIRECT)
└── fct_orders.order_amount (RENAMED)
└── fct_orders.order_total (DERIVED)
└── orders.total_revenue (MEASURE_SOURCE)
└── monthly_revenue (METRIC_SOURCE)
Querying Lineage via API
Get Column Lineage
Retrieve upstream lineage for a specific column:
curl -X GET "http://localhost:8000/api/v1/lineage/model.fct_orders.order_total?direction=upstream"
Response:
{
"root": {
"artifact_id": "model.fct_orders",
"artifact_type": "model_column",
"artifact_name": "fct_orders",
"column_name": "order_total"
},
"nodes": [
{
"artifact_id": "model.fct_orders",
"artifact_type": "model_column",
"artifact_name": "fct_orders",
"column_name": "order_total"
},
{
"artifact_id": "model.stg_orders",
"artifact_type": "model_column",
"artifact_name": "stg_orders",
"column_name": "amount"
},
{
"artifact_id": "model.stg_orders",
"artifact_type": "model_column",
"artifact_name": "stg_orders",
"column_name": "quantity"
},
{
"artifact_id": "source.raw.orders",
"artifact_type": "source_column",
"artifact_name": "raw.orders",
"column_name": "amount"
},
{
"artifact_id": "source.raw.orders",
"artifact_type": "source_column",
"artifact_name": "raw.orders",
"column_name": "quantity"
}
],
"edges": [
{
"source": {"artifact_name": "stg_orders", "column_name": "amount"},
"target": {"artifact_name": "fct_orders", "column_name": "order_total"},
"edge_type": "derived",
"transformation": "amount * quantity"
},
{
"source": {"artifact_name": "stg_orders", "column_name": "quantity"},
"target": {"artifact_name": "fct_orders", "column_name": "order_total"},
"edge_type": "derived",
"transformation": "amount * quantity"
}
],
"computed_at": "2024-01-15T10:30:00Z",
"depth": 3
}
Get Metric Lineage
Trace a metric back to its source columns:
curl -X GET "http://localhost:8000/api/v1/lineage/metric.monthly_revenue?direction=upstream&max_depth=10"
Get Downstream Consumers
Find all artifacts that consume a source column:
curl -X GET "http://localhost:8000/api/v1/lineage/source.raw.orders.amount?direction=downstream"
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
direction | string | both | upstream, downstream, or both |
column | string | null | Specific column for column-level lineage |
max_depth | int | 10 | Maximum traversal depth (1-50) |
max_nodes | int | 1000 | Maximum nodes to return (1-10000) |
CLI Commands
View Column Lineage
# Trace a column upstream to its sources
olytix-core lineage column fct_orders.order_total --upstream
# Trace a column downstream to its consumers
olytix-core lineage column raw.orders.amount --downstream
# Get full lineage (both directions)
olytix-core lineage column stg_orders.amount
View Metric Lineage
# Trace metric to source columns
olytix-core lineage metric monthly_revenue
Export Lineage
# Export as JSON
olytix-core lineage column fct_orders.order_total --format json > lineage.json
# Export as DOT for visualization
olytix-core lineage metric monthly_revenue --format dot > metric_lineage.dot
dot -Tpng metric_lineage.dot > metric_lineage.png
Including Lineage in Query Responses
Request lineage information with your semantic queries:
{
"query": {
"metrics": ["monthly_revenue"],
"dimensions": ["orders.region"],
"time_dimension": {
"dimension": "orders.order_date",
"granularity": "month"
}
},
"options": {
"include_lineage": true
}
}
Response includes lineage metadata:
{
"data": [...],
"meta": {
"lineage": {
"monthly_revenue": {
"type": "metric",
"sources": [
"source.raw.orders.amount",
"source.raw.orders.quantity"
],
"path": [
"raw.orders.amount → stg_orders.amount → fct_orders.order_total → orders.total_revenue → monthly_revenue"
],
"edge_summary": {
"direct": 2,
"derived": 1,
"measure_source": 1,
"metric_source": 1
}
}
}
}
}
Lineage Computation
Olytix Core automatically extracts column-level lineage using SQL parsing:
From SQL Transformations
The extractor analyzes SELECT statements to identify:
- Direct column references
- Aliased columns (renames)
- Expressions and calculations
- Aggregation functions
- JOIN conditions
-- Olytix Core parses this SQL and extracts:
SELECT
o.order_id, -- DIRECT
o.amount AS order_amount, -- RENAMED
o.amount * o.quantity AS total, -- DERIVED (from amount, quantity)
SUM(o.amount) OVER (PARTITION BY c.id), -- AGGREGATED
c.name -- JOINED
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c ON o.customer_id = c.id
From Cube Definitions
Measure and dimension SQL expressions are parsed to link to source columns:
measures:
- name: total_revenue
type: sum
sql: order_amount # Links to fct_orders.order_amount
From Metric Definitions
Metric expressions reference cube measures, creating the final lineage link:
metrics:
- name: revenue_per_customer
expression: orders.total_revenue / customers.customer_count
Best Practices
1. Write Parseable SQL
Avoid patterns that obscure lineage:
-- Hard to trace (SELECT *)
SELECT * FROM {{ ref('model') }}
-- Easy to trace (explicit columns)
SELECT
order_id,
amount,
customer_id
FROM {{ ref('model') }}
2. Use Meaningful Aliases
Make renamed columns traceable:
-- Unclear rename
SELECT a AS x FROM orders
-- Clear rename
SELECT order_amount AS revenue FROM orders
3. Document Complex Derivations
Add comments for complex calculations:
SELECT
-- Net revenue = gross amount minus refunds and discounts
gross_amount - refund_amount - (gross_amount * discount_pct) AS net_revenue
FROM {{ ref('stg_orders') }}
4. Keep Transformations Shallow
Prefer direct paths over deeply nested transformations:
Preferred: source → staging → mart → cube
Avoid: source → stg1 → stg2 → int1 → int2 → mart → cube
Next Steps
- Impact Analysis - Analyze the downstream effects of column changes
- Dependency Graph - Understand model-level dependencies
- Data Testing - Validate data quality across the lineage