Skip to main content

Materializations

For Data Analysts

Materializations determine how Olytix Core persists your model's results in the warehouse. Choosing the right materialization impacts query performance, build time, and storage costs.

Overview

Olytix Core supports four materialization types:

TypePersistenceBuild TimeQuery SpeedStorage
viewVirtualFastSlowerNone
tablePhysicalMediumFastFull
incrementalPhysicalFastestFastFull
ephemeralNone (CTE)NoneN/ANone

View Materialization

A view is a virtual table defined by a SQL query. The query runs each time the view is accessed.

{{ config(materialized='view') }}

SELECT
order_id,
customer_id,
order_date,
total_amount
FROM {{ source('raw', 'orders') }}
WHERE status = 'completed'

When to Use Views

  • Staging models: Light transformations on source data
  • Small datasets: Under 1 million rows
  • Always-fresh requirements: Data must reflect real-time state
  • Simple transformations: No complex aggregations or joins

Advantages

  • No storage cost
  • Always current data
  • Fast build time (just creates view definition)
  • Simple to understand and debug

Disadvantages

  • Slower query performance on large data
  • Query runs every time view is accessed
  • Complex views can timeout

Configuration

{{ config(
materialized='view',
schema='staging',
alias='v_orders_cleaned'
) }}

Table Materialization

A table stores the query results physically. The entire table rebuilds on each run.

{{ config(materialized='table') }}

SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS lifetime_value,
MIN(order_date) AS first_order_date,
MAX(order_date) AS last_order_date
FROM {{ ref('stg_orders') }}
GROUP BY customer_id

When to Use Tables

  • Mart models: Final fact and dimension tables
  • Complex aggregations: Heavy GROUP BY operations
  • Large joins: Multiple table joins
  • Frequently queried data: Dashboards and reports
  • Data > 1 million rows: Performance-critical queries

Advantages

  • Fast query performance
  • Query results are pre-computed
  • Consistent query times regardless of complexity

Disadvantages

  • Higher storage cost
  • Longer build time (full refresh)
  • Data staleness between builds

Configuration

{{ config(
materialized='table',
schema='analytics',
alias='fct_customer_orders',
post_hook=[
"CREATE INDEX idx_customer_id ON {{ this }} (customer_id)",
"ANALYZE {{ this }}"
]
) }}

Table Options

{{ config(
materialized='table',

-- PostgreSQL partitioning
partition_by={
"field": "order_date",
"data_type": "date",
"granularity": "month"
},

-- Clustering (warehouse-specific)
cluster_by=['customer_region', 'order_date'],

-- Distribution (warehouse-specific)
dist='customer_id'
) }}

Incremental Materialization

Incremental models only process new or changed data, appending to or updating an existing table.

{{ config(
materialized='incremental',
unique_key='event_id'
) }}

SELECT
event_id,
user_id,
event_type,
event_timestamp,
event_properties
FROM {{ source('raw', 'events') }}

{% if is_incremental() %}
WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}

When to Use Incremental

  • Large fact tables: Billions of rows
  • Append-only data: Event logs, transactions
  • Time-series data: Data with clear timestamps
  • Long-running full refreshes: Tables that take hours to rebuild

Advantages

  • Much faster builds (only processes new data)
  • Reduced warehouse load
  • Lower compute costs

Disadvantages

  • More complex logic
  • Risk of data drift if not configured correctly
  • Requires careful handling of late-arriving data

Configuration Options

{{ config(
materialized='incremental',
unique_key='order_id', -- For merge/update operations
incremental_strategy='merge', -- merge, append, delete+insert
on_schema_change='sync_all_columns', -- Handle schema evolution
incremental_predicates=[
"order_date >= CURRENT_DATE - 7" -- Optimize merge performance
]
) }}

See Incremental Models for detailed strategies.

Ephemeral Materialization

Ephemeral models are not persisted. They compile to CTEs (Common Table Expressions) in downstream models.

-- models/intermediate/int_order_totals.sql
{{ config(materialized='ephemeral') }}

SELECT
order_id,
SUM(quantity * unit_price) AS order_total,
SUM(quantity * unit_price * discount_pct / 100) AS discount_amount
FROM {{ ref('stg_order_items') }}
GROUP BY order_id

When referenced by another model:

-- models/marts/fct_orders.sql
SELECT
o.order_id,
o.customer_id,
t.order_total,
t.discount_amount
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('int_order_totals') }} t
ON o.order_id = t.order_id

Compiles to:

WITH int_order_totals AS (
SELECT
order_id,
SUM(quantity * unit_price) AS order_total,
SUM(quantity * unit_price * discount_pct / 100) AS discount_amount
FROM staging.stg_order_items
GROUP BY order_id
)
SELECT
o.order_id,
o.customer_id,
t.order_total,
t.discount_amount
FROM staging.stg_orders o
LEFT JOIN int_order_totals t
ON o.order_id = t.order_id

When to Use Ephemeral

  • Intermediate transformations: Logic used by multiple models
  • Abstract complexity: Break down complex models
  • No direct querying needed: Only used as building blocks
  • Reduce warehouse objects: Minimize tables/views

Advantages

  • No storage cost
  • No database objects created
  • Useful for code organization

Disadvantages

  • Cannot query directly
  • Recomputed in every downstream model
  • Complex debugging (must trace through CTEs)

Configuration

{{ config(
materialized='ephemeral'
) }}

Choosing a Materialization

Use this decision tree:

Materialization Decision Tree

Follow the questions to choose the right materialization strategy.

Legend
Decision
Yes Path
No Path
view
table
incremental
ephemeral

Quick Reference

ScenarioMaterialization
Staging layer, 1:1 with sourceview
Complex aggregation for dashboardstable
Event log with millions of rows dailyincremental
Reusable intermediate logicephemeral
Dimension table with slow-changing datatable
Real-time reporting requirementview
Large fact table with timestampincremental

Project-Level Configuration

Set materialization defaults by folder:

# olytix-core_project.yml
models:
my_project:
# Default for all models
+materialized: view

staging:
# All staging models are views
+materialized: view
+schema: staging

intermediate:
# Intermediate models don't persist
+materialized: ephemeral

marts:
# Mart models are tables
+materialized: table
+schema: analytics

# Facts use incremental where possible
facts:
+materialized: incremental

Materialization by Warehouse

Some options are warehouse-specific:

PostgreSQL

{{ config(
materialized='table',
indexes=[
{'columns': ['customer_id'], 'type': 'btree'},
{'columns': ['order_date'], 'type': 'btree'}
]
) }}

BigQuery

{{ config(
materialized='table',
partition_by={
"field": "order_date",
"data_type": "date"
},
cluster_by=['customer_region']
) }}

Snowflake

{{ config(
materialized='table',
cluster_by=['order_date', 'customer_region'],
transient=true
) }}

Full Refresh

Force a full rebuild of incremental models:

# Rebuild specific model from scratch
olytix-core run --select fct_events --full-refresh

# Rebuild all incremental models
olytix-core run --full-refresh

Use full refresh when:

  • Schema changes require rebuilding
  • Data quality issues need correction
  • Historical data reprocessing needed

Best Practices

1. Start Simple

Begin with views, graduate to tables/incremental as needed:

Development: view → Test with table → Production: incremental

2. Monitor Build Times

Track which models are slow:

olytix-core run --select +fct_orders --timing

3. Balance Storage and Performance

PriorityStrategy
Query speed criticalUse tables
Storage limitedUse views, ephemeral
Large data volumeUse incremental

4. Document Choices

Explain why you chose a materialization:

{{ config(
materialized='incremental',
unique_key='event_id'
) }}

{#
Using incremental because:
- Table has 500M+ rows
- ~1M new events daily
- Full refresh takes 4+ hours
#}

Next Steps

  1. Learn incremental strategies - Deep dive into merge, append, and delete+insert
  2. Understand the dependency graph - How models connect
  3. Define cubes - Build semantic layer on materialized models