Materializations

For Data Analysts

Materializations determine how Olytix Core persists your model's results in the warehouse. Choosing the right materialization impacts query performance, build time, and storage costs.

Overview

Olytix Core supports four materialization types:

Type	Persistence	Build Time	Query Speed	Storage
`view`	Virtual	Fast	Slower	None
`table`	Physical	Medium	Fast	Full
`incremental`	Physical	Fastest	Fast	Full
`ephemeral`	None (CTE)	None	N/A	None

View Materialization

A view is a virtual table defined by a SQL query. The query runs each time the view is accessed.

{{ config(materialized='view') }}

SELECT
    order_id,
    customer_id,
    order_date,
    total_amount
FROM {{ source('raw', 'orders') }}
WHERE status = 'completed'

When to Use Views

Staging models: Light transformations on source data
Small datasets: Under 1 million rows
Always-fresh requirements: Data must reflect real-time state
Simple transformations: No complex aggregations or joins

Advantages

No storage cost
Always current data
Fast build time (just creates view definition)
Simple to understand and debug

Disadvantages

Slower query performance on large data
Query runs every time view is accessed
Complex views can timeout

Configuration

{{ config(
    materialized='view',
    schema='staging',
    alias='v_orders_cleaned'
) }}

Table Materialization

A table stores the query results physically. The entire table rebuilds on each run.

{{ config(materialized='table') }}

SELECT
    customer_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS lifetime_value,
    MIN(order_date) AS first_order_date,
    MAX(order_date) AS last_order_date
FROM {{ ref('stg_orders') }}
GROUP BY customer_id

When to Use Tables

Mart models: Final fact and dimension tables
Complex aggregations: Heavy GROUP BY operations
Large joins: Multiple table joins
Frequently queried data: Dashboards and reports
Data > 1 million rows: Performance-critical queries

Advantages

Fast query performance
Query results are pre-computed
Consistent query times regardless of complexity

Disadvantages

Higher storage cost
Longer build time (full refresh)
Data staleness between builds

Configuration

{{ config(
    materialized='table',
    schema='analytics',
    alias='fct_customer_orders',
    post_hook=[
        "CREATE INDEX idx_customer_id ON {{ this }} (customer_id)",
        "ANALYZE {{ this }}"
    ]
) }}

Table Options

{{ config(
    materialized='table',

    -- PostgreSQL partitioning
    partition_by={
        "field": "order_date",
        "data_type": "date",
        "granularity": "month"
    },

    -- Clustering (warehouse-specific)
    cluster_by=['customer_region', 'order_date'],

    -- Distribution (warehouse-specific)
    dist='customer_id'
) }}

Incremental Materialization

Incremental models only process new or changed data, appending to or updating an existing table.

{{ config(
    materialized='incremental',
    unique_key='event_id'
) }}

SELECT
    event_id,
    user_id,
    event_type,
    event_timestamp,
    event_properties
FROM {{ source('raw', 'events') }}

{% if is_incremental() %}
WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}

When to Use Incremental

Large fact tables: Billions of rows
Append-only data: Event logs, transactions
Time-series data: Data with clear timestamps
Long-running full refreshes: Tables that take hours to rebuild

Advantages

Much faster builds (only processes new data)
Reduced warehouse load
Lower compute costs

Disadvantages

More complex logic
Risk of data drift if not configured correctly
Requires careful handling of late-arriving data

Configuration Options

{{ config(
    materialized='incremental',
    unique_key='order_id',                    -- For merge/update operations
    incremental_strategy='merge',              -- merge, append, delete+insert
    on_schema_change='sync_all_columns',      -- Handle schema evolution
    incremental_predicates=[
        "order_date >= CURRENT_DATE - 7"      -- Optimize merge performance
    ]
) }}

See Incremental Models for detailed strategies.

Ephemeral Materialization

Ephemeral models are not persisted. They compile to CTEs (Common Table Expressions) in downstream models.

-- models/intermediate/int_order_totals.sql
{{ config(materialized='ephemeral') }}

SELECT
    order_id,
    SUM(quantity * unit_price) AS order_total,
    SUM(quantity * unit_price * discount_pct / 100) AS discount_amount
FROM {{ ref('stg_order_items') }}
GROUP BY order_id

When referenced by another model:

-- models/marts/fct_orders.sql
SELECT
    o.order_id,
    o.customer_id,
    t.order_total,
    t.discount_amount
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('int_order_totals') }} t
    ON o.order_id = t.order_id

Compiles to:

WITH int_order_totals AS (
    SELECT
        order_id,
        SUM(quantity * unit_price) AS order_total,
        SUM(quantity * unit_price * discount_pct / 100) AS discount_amount
    FROM staging.stg_order_items
    GROUP BY order_id
)
SELECT
    o.order_id,
    o.customer_id,
    t.order_total,
    t.discount_amount
FROM staging.stg_orders o
LEFT JOIN int_order_totals t
    ON o.order_id = t.order_id

When to Use Ephemeral

Intermediate transformations: Logic used by multiple models
Abstract complexity: Break down complex models
No direct querying needed: Only used as building blocks
Reduce warehouse objects: Minimize tables/views

Advantages

No storage cost
No database objects created
Useful for code organization

Disadvantages

Cannot query directly
Recomputed in every downstream model
Complex debugging (must trace through CTEs)

Configuration

{{ config(
    materialized='ephemeral'
) }}

Choosing a Materialization

Use this decision tree:

Materialization Decision Tree

Follow the questions to choose the right materialization strategy.

Legend

Decision

Yes Path

No Path

view

table

incremental

ephemeral

Quick Reference

Scenario	Materialization
Staging layer, 1:1 with source	`view`
Complex aggregation for dashboards	`table`
Event log with millions of rows daily	`incremental`
Reusable intermediate logic	`ephemeral`
Dimension table with slow-changing data	`table`
Real-time reporting requirement	`view`
Large fact table with timestamp	`incremental`

Project-Level Configuration

Set materialization defaults by folder:

# olytix-core_project.yml
models:
  my_project:
    # Default for all models
    +materialized: view

    staging:
      # All staging models are views
      +materialized: view
      +schema: staging

    intermediate:
      # Intermediate models don't persist
      +materialized: ephemeral

    marts:
      # Mart models are tables
      +materialized: table
      +schema: analytics

      # Facts use incremental where possible
      facts:
        +materialized: incremental

Materialization by Warehouse

Some options are warehouse-specific:

PostgreSQL

{{ config(
    materialized='table',
    indexes=[
        {'columns': ['customer_id'], 'type': 'btree'},
        {'columns': ['order_date'], 'type': 'btree'}
    ]
) }}

BigQuery

{{ config(
    materialized='table',
    partition_by={
        "field": "order_date",
        "data_type": "date"
    },
    cluster_by=['customer_region']
) }}

Snowflake

{{ config(
    materialized='table',
    cluster_by=['order_date', 'customer_region'],
    transient=true
) }}

Full Refresh

Force a full rebuild of incremental models:

# Rebuild specific model from scratch
olytix-core run --select fct_events --full-refresh

# Rebuild all incremental models
olytix-core run --full-refresh

Use full refresh when:

Schema changes require rebuilding
Data quality issues need correction
Historical data reprocessing needed

Best Practices

1. Start Simple

Begin with views, graduate to tables/incremental as needed:

Development: view → Test with table → Production: incremental

2. Monitor Build Times

Track which models are slow:

olytix-core run --select +fct_orders --timing

3. Balance Storage and Performance

Priority	Strategy
Query speed critical	Use tables
Storage limited	Use views, ephemeral
Large data volume	Use incremental

4. Document Choices

Explain why you chose a materialization:

{{ config(
    materialized='incremental',
    unique_key='event_id'
) }}

{#
    Using incremental because:
    - Table has 500M+ rows
    - ~1M new events daily
    - Full refresh takes 4+ hours
#}

Next Steps

Learn incremental strategies - Deep dive into merge, append, and delete+insert
Understand the dependency graph - How models connect
Define cubes - Build semantic layer on materialized models

Overview​

View Materialization​

When to Use Views​

Advantages​

Disadvantages​

Configuration​

Table Materialization​

When to Use Tables​

Advantages​

Disadvantages​

Configuration​

Table Options​

Incremental Materialization​

When to Use Incremental​

Advantages​

Disadvantages​

Configuration Options​

Ephemeral Materialization​

When to Use Ephemeral​

Advantages​

Disadvantages​

Configuration​

Choosing a Materialization​

Materialization Decision Tree

Quick Reference​

Project-Level Configuration​

Materialization by Warehouse​

PostgreSQL​

BigQuery​

Snowflake​

Full Refresh​

Best Practices​

1. Start Simple​

2. Monitor Build Times​

3. Balance Storage and Performance​

4. Document Choices​

Next Steps​

Overview

View Materialization

When to Use Views

Advantages

Disadvantages

Configuration

Table Materialization

When to Use Tables

Advantages

Disadvantages

Configuration

Table Options

Incremental Materialization

When to Use Incremental

Advantages

Disadvantages

Configuration Options

Ephemeral Materialization

When to Use Ephemeral

Advantages

Disadvantages

Configuration

Choosing a Materialization

Quick Reference

Project-Level Configuration

Materialization by Warehouse

PostgreSQL

BigQuery

Snowflake

Full Refresh

Best Practices

1. Start Simple

2. Monitor Build Times

3. Balance Storage and Performance

4. Document Choices

Next Steps