Materializations
Materializations determine how Olytix Core persists your model's results in the warehouse. Choosing the right materialization impacts query performance, build time, and storage costs.
Overview
Olytix Core supports four materialization types:
| Type | Persistence | Build Time | Query Speed | Storage |
|---|---|---|---|---|
view | Virtual | Fast | Slower | None |
table | Physical | Medium | Fast | Full |
incremental | Physical | Fastest | Fast | Full |
ephemeral | None (CTE) | None | N/A | None |
View Materialization
A view is a virtual table defined by a SQL query. The query runs each time the view is accessed.
{{ config(materialized='view') }}
SELECT
order_id,
customer_id,
order_date,
total_amount
FROM {{ source('raw', 'orders') }}
WHERE status = 'completed'
When to Use Views
- Staging models: Light transformations on source data
- Small datasets: Under 1 million rows
- Always-fresh requirements: Data must reflect real-time state
- Simple transformations: No complex aggregations or joins
Advantages
- No storage cost
- Always current data
- Fast build time (just creates view definition)
- Simple to understand and debug
Disadvantages
- Slower query performance on large data
- Query runs every time view is accessed
- Complex views can timeout
Configuration
{{ config(
materialized='view',
schema='staging',
alias='v_orders_cleaned'
) }}
Table Materialization
A table stores the query results physically. The entire table rebuilds on each run.
{{ config(materialized='table') }}
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS lifetime_value,
MIN(order_date) AS first_order_date,
MAX(order_date) AS last_order_date
FROM {{ ref('stg_orders') }}
GROUP BY customer_id
When to Use Tables
- Mart models: Final fact and dimension tables
- Complex aggregations: Heavy GROUP BY operations
- Large joins: Multiple table joins
- Frequently queried data: Dashboards and reports
- Data > 1 million rows: Performance-critical queries
Advantages
- Fast query performance
- Query results are pre-computed
- Consistent query times regardless of complexity
Disadvantages
- Higher storage cost
- Longer build time (full refresh)
- Data staleness between builds
Configuration
{{ config(
materialized='table',
schema='analytics',
alias='fct_customer_orders',
post_hook=[
"CREATE INDEX idx_customer_id ON {{ this }} (customer_id)",
"ANALYZE {{ this }}"
]
) }}
Table Options
{{ config(
materialized='table',
-- PostgreSQL partitioning
partition_by={
"field": "order_date",
"data_type": "date",
"granularity": "month"
},
-- Clustering (warehouse-specific)
cluster_by=['customer_region', 'order_date'],
-- Distribution (warehouse-specific)
dist='customer_id'
) }}
Incremental Materialization
Incremental models only process new or changed data, appending to or updating an existing table.
{{ config(
materialized='incremental',
unique_key='event_id'
) }}
SELECT
event_id,
user_id,
event_type,
event_timestamp,
event_properties
FROM {{ source('raw', 'events') }}
{% if is_incremental() %}
WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}
When to Use Incremental
- Large fact tables: Billions of rows
- Append-only data: Event logs, transactions
- Time-series data: Data with clear timestamps
- Long-running full refreshes: Tables that take hours to rebuild
Advantages
- Much faster builds (only processes new data)
- Reduced warehouse load
- Lower compute costs
Disadvantages
- More complex logic
- Risk of data drift if not configured correctly
- Requires careful handling of late-arriving data
Configuration Options
{{ config(
materialized='incremental',
unique_key='order_id', -- For merge/update operations
incremental_strategy='merge', -- merge, append, delete+insert
on_schema_change='sync_all_columns', -- Handle schema evolution
incremental_predicates=[
"order_date >= CURRENT_DATE - 7" -- Optimize merge performance
]
) }}
See Incremental Models for detailed strategies.
Ephemeral Materialization
Ephemeral models are not persisted. They compile to CTEs (Common Table Expressions) in downstream models.
-- models/intermediate/int_order_totals.sql
{{ config(materialized='ephemeral') }}
SELECT
order_id,
SUM(quantity * unit_price) AS order_total,
SUM(quantity * unit_price * discount_pct / 100) AS discount_amount
FROM {{ ref('stg_order_items') }}
GROUP BY order_id
When referenced by another model:
-- models/marts/fct_orders.sql
SELECT
o.order_id,
o.customer_id,
t.order_total,
t.discount_amount
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('int_order_totals') }} t
ON o.order_id = t.order_id
Compiles to:
WITH int_order_totals AS (
SELECT
order_id,
SUM(quantity * unit_price) AS order_total,
SUM(quantity * unit_price * discount_pct / 100) AS discount_amount
FROM staging.stg_order_items
GROUP BY order_id
)
SELECT
o.order_id,
o.customer_id,
t.order_total,
t.discount_amount
FROM staging.stg_orders o
LEFT JOIN int_order_totals t
ON o.order_id = t.order_id
When to Use Ephemeral
- Intermediate transformations: Logic used by multiple models
- Abstract complexity: Break down complex models
- No direct querying needed: Only used as building blocks
- Reduce warehouse objects: Minimize tables/views
Advantages
- No storage cost
- No database objects created
- Useful for code organization
Disadvantages
- Cannot query directly
- Recomputed in every downstream model
- Complex debugging (must trace through CTEs)
Configuration
{{ config(
materialized='ephemeral'
) }}
Choosing a Materialization
Use this decision tree:
Materialization Decision Tree
Follow the questions to choose the right materialization strategy.
Quick Reference
| Scenario | Materialization |
|---|---|
| Staging layer, 1:1 with source | view |
| Complex aggregation for dashboards | table |
| Event log with millions of rows daily | incremental |
| Reusable intermediate logic | ephemeral |
| Dimension table with slow-changing data | table |
| Real-time reporting requirement | view |
| Large fact table with timestamp | incremental |
Project-Level Configuration
Set materialization defaults by folder:
# olytix-core_project.yml
models:
my_project:
# Default for all models
+materialized: view
staging:
# All staging models are views
+materialized: view
+schema: staging
intermediate:
# Intermediate models don't persist
+materialized: ephemeral
marts:
# Mart models are tables
+materialized: table
+schema: analytics
# Facts use incremental where possible
facts:
+materialized: incremental
Materialization by Warehouse
Some options are warehouse-specific:
PostgreSQL
{{ config(
materialized='table',
indexes=[
{'columns': ['customer_id'], 'type': 'btree'},
{'columns': ['order_date'], 'type': 'btree'}
]
) }}
BigQuery
{{ config(
materialized='table',
partition_by={
"field": "order_date",
"data_type": "date"
},
cluster_by=['customer_region']
) }}
Snowflake
{{ config(
materialized='table',
cluster_by=['order_date', 'customer_region'],
transient=true
) }}
Full Refresh
Force a full rebuild of incremental models:
# Rebuild specific model from scratch
olytix-core run --select fct_events --full-refresh
# Rebuild all incremental models
olytix-core run --full-refresh
Use full refresh when:
- Schema changes require rebuilding
- Data quality issues need correction
- Historical data reprocessing needed
Best Practices
1. Start Simple
Begin with views, graduate to tables/incremental as needed:
Development: view → Test with table → Production: incremental
2. Monitor Build Times
Track which models are slow:
olytix-core run --select +fct_orders --timing
3. Balance Storage and Performance
| Priority | Strategy |
|---|---|
| Query speed critical | Use tables |
| Storage limited | Use views, ephemeral |
| Large data volume | Use incremental |
4. Document Choices
Explain why you chose a materialization:
{{ config(
materialized='incremental',
unique_key='event_id'
) }}
{#
Using incremental because:
- Table has 500M+ rows
- ~1M new events daily
- Full refresh takes 4+ hours
#}
Next Steps
- Learn incremental strategies - Deep dive into merge, append, and delete+insert
- Understand the dependency graph - How models connect
- Define cubes - Build semantic layer on materialized models