Skip to main content

Lineage Basics

For Data Analysts

Lineage tracks how data flows through your systemβ€”from raw sources through transformations to final metrics. Olytix Core provides column-level lineage, showing exactly which source columns contribute to each metric.

What is Lineage?​

Lineage answers: "Where does this data come from?"

Column-Level Lineage

Track data flow from source to metric. Edges show transformation types.

React Flow mini map
Legend
πŸ—„οΈSource
βš™οΈStaging
πŸ“ŠMart
🧊Cube
🎯Metric

Lineage Levels​

Olytix Core tracks lineage at multiple levels:

Model-Level Lineage​

Which models depend on which:

stg_orders ─────────────┐
β”‚
stg_customers ──────────┼──► fct_orders
β”‚
stg_products β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Column-Level Lineage​

Which columns flow to which:

stg_orders.order_id     ──► fct_orders.order_id
stg_orders.amount ──► fct_orders.order_amount
stg_customers.name ──► fct_orders.customer_name

Measure/Metric Lineage​

Which columns compose measures and metrics:

raw.orders.amount
└──► stg_orders.amount
└──► fct_orders.order_amount
└──► orders.total_revenue (measure)
└──► monthly_revenue (metric)

Lineage Edge Types​

Olytix Core categorizes lineage relationships:

Edge TypeDescriptionExample
DIRECTColumn passed unchangedSELECT id FROM source
RENAMEDColumn renamedSELECT id AS order_id
DERIVEDColumn calculatedSELECT a + b AS total
AGGREGATEDColumn aggregatedSELECT SUM(amount)
JOINEDColumn from joinLEFT JOIN ... ON
FILTEREDColumn used in filterWHERE status = 'active'
MEASURE_SOURCEColumn to measuresql: amount in measure
DIMENSION_SOURCEColumn to dimensionsql: region in dimension
METRIC_SOURCEMeasure to metricexpression: cube.measure

Viewing Lineage​

CLI Commands​

# View lineage for a metric
olytix-core lineage metric monthly_revenue

# View lineage for a column
olytix-core lineage column fct_orders.order_amount

# View lineage for a model
olytix-core lineage model fct_orders

# Export lineage graph
olytix-core lineage --format json > lineage.json

API Endpoints​

# Get metric lineage
curl http://localhost:8000/api/v1/lineage/metrics/monthly_revenue

# Get column lineage
curl http://localhost:8000/api/v1/lineage/columns/fct_orders/order_amount

# Get full lineage graph
curl http://localhost:8000/api/v1/lineage

Example Output​

Lineage: monthly_revenue
═══════════════════════════════════════════════════════════════

metric.monthly_revenue
β”‚
└── METRIC_SOURCE
β”‚
cube.orders.total_revenue (measure: SUM)
β”‚
└── MEASURE_SOURCE
β”‚
model.fct_orders.order_amount
β”‚
└── RENAMED (from: amount)
β”‚
model.stg_orders.amount
β”‚
└── DIRECT
β”‚
source.raw.orders.amount

═══════════════════════════════════════════════════════════════

How Lineage is Extracted​

From SQL​

Olytix Core parses SQL to extract lineage:

SELECT
o.order_id, -- DIRECT from stg_orders.order_id
o.amount AS order_amount, -- RENAMED from stg_orders.amount
o.amount * o.quantity AS total, -- DERIVED from amount, quantity
c.customer_name -- JOINED from stg_customers
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
ON o.customer_id = c.customer_id
WHERE o.status = 'completed' -- FILTERED by status

Extracted lineage:

fct_orders.order_id      ← stg_orders.order_id (DIRECT)
fct_orders.order_amount ← stg_orders.amount (RENAMED)
fct_orders.total ← stg_orders.amount (DERIVED)
fct_orders.total ← stg_orders.quantity (DERIVED)
fct_orders.customer_name ← stg_customers.customer_name (JOINED)

From Cube Definitions​

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

measures:
- name: total_revenue
type: sum
sql: order_amount # MEASURE_SOURCE from fct_orders.order_amount

From Metric Definitions​

metrics:
- name: monthly_revenue
expression: orders.total_revenue # METRIC_SOURCE from cube measure

Upstream vs Downstream​

Upstream Lineage​

"What does this depend on?" (looking backwards)

olytix-core lineage --upstream monthly_revenue
Upstream of: monthly_revenue
═══════════════════════════════════════════════════════════════

Sources:
β€’ raw.orders.amount
β€’ raw.orders.quantity

Staging:
β€’ stg_orders.amount
β€’ stg_orders.quantity

Marts:
β€’ fct_orders.order_amount

Cubes:
β€’ orders.total_revenue

═══════════════════════════════════════════════════════════════

Downstream Lineage​

"What depends on this?" (looking forwards)

olytix-core lineage --downstream raw.orders.amount
Downstream of: raw.orders.amount
═══════════════════════════════════════════════════════════════

Staging:
β€’ stg_orders.amount

Marts:
β€’ fct_orders.order_amount
β€’ fct_orders.total

Cubes:
β€’ orders.total_revenue
β€’ orders.avg_order_value

Metrics:
β€’ monthly_revenue
β€’ avg_order_value
β€’ revenue_growth

═══════════════════════════════════════════════════════════════

Impact Analysis​

Use lineage for impact analysis:

Before Changing a Source Column​

olytix-core impact raw.orders.amount
Impact Analysis: raw.orders.amount
═══════════════════════════════════════════════════════════════

Changing this column will affect:

Models (3):
β€’ stg_orders (DIRECT dependency)
β€’ fct_orders (via stg_orders)
β€’ fct_order_summary (via fct_orders)

Cubes (1):
β€’ orders
- Measures: total_revenue, avg_order_value
- Dimensions: (none)

Metrics (5):
β€’ monthly_revenue
β€’ quarterly_revenue
β€’ avg_order_value
β€’ revenue_growth
β€’ revenue_per_customer

Dashboards (3):
β€’ Executive Dashboard
β€’ Sales Performance
β€’ Finance Overview

Users Affected: 156

⚠ HIGH IMPACT: This change affects critical financial metrics.
Recommend: Review with Finance team before proceeding.

═══════════════════════════════════════════════════════════════

Before Deprecating a Model​

olytix-core impact --deprecate stg_orders

Shows all downstream artifacts that would break.

Lineage Visualization​

Text Format​

raw.orders.amount
β”‚
└── stg_orders.amount (DIRECT)
β”‚
β”œβ”€β”€ fct_orders.order_amount (RENAMED)
β”‚ β”‚
β”‚ └── orders.total_revenue (AGGREGATED)
β”‚ β”‚
β”‚ └── monthly_revenue (METRIC)
β”‚
└── fct_orders.total (DERIVED)
β”‚
└── orders.order_total (AGGREGATED)

JSON Format​

olytix-core lineage --format json monthly_revenue
{
"root": "metric.monthly_revenue",
"edges": [
{
"source": "metric.monthly_revenue",
"target": "cube.orders.total_revenue",
"type": "METRIC_SOURCE"
},
{
"source": "cube.orders.total_revenue",
"target": "model.fct_orders.order_amount",
"type": "MEASURE_SOURCE"
}
]
}

Graphviz DOT Format​

olytix-core lineage --format dot monthly_revenue > lineage.dot
dot -Tpng lineage.dot > lineage.png

Lineage in the API​

Query Lineage​

{
"query": {
"metrics": ["monthly_revenue"],
"dimensions": ["orders.region"]
},
"options": {
"include_lineage": true
}
}

Response includes lineage:

{
"data": [...],
"lineage": {
"monthly_revenue": {
"sources": ["raw.orders.amount"],
"path": [
"raw.orders.amount",
"stg_orders.amount",
"fct_orders.order_amount",
"orders.total_revenue",
"monthly_revenue"
]
}
}
}

Best Practices​

1. Maintain Clean Lineage​

Avoid complex SQL that obscures lineage:

-- Hard to trace
SELECT * FROM (
SELECT * FROM (
SELECT * FROM {{ ref('model') }}
)
)

-- Easy to trace
SELECT
order_id,
amount
FROM {{ ref('model') }}

2. Use Meaningful Names​

Column names should be traceable:

-- Unclear lineage
SELECT a AS x FROM table

-- Clear lineage
SELECT order_amount AS revenue FROM orders

3. Document Derivations​

For complex calculations, add comments:

SELECT
-- Revenue = amount * quantity * (1 - discount)
amount * quantity * (1 - discount_pct) AS revenue
FROM {{ ref('stg_orders') }}

4. Regular Lineage Audits​

Periodically review lineage for:

  • Orphaned columns (no upstream)
  • Dead ends (no downstream)
  • Unexpected connections

Next Steps​

Now that you understand lineage basics:

  1. Explore column-level lineage in depth β†’
  2. Learn impact analysis β†’
  3. Start defining sources β†’