Models Schema
Models are SQL transformations that transform raw source data into analytics-ready datasets. This reference documents all fields available when defining models.
File Structure
Models consist of two file types:
- SQL files (
.sql): Contain the transformation logic - YAML files (
.yml): Contain metadata, documentation, and tests
my-project/
└── models/
├── staging/
│ ├── stg_orders.sql
│ └── staging.yml
└── marts/
├── fct_orders.sql
├── dim_customers.sql
└── marts.yml
SQL File Structure
Model SQL files support Jinja templating with special functions:
{{ config(materialized='table') }}
SELECT
id AS order_id,
customer_id,
total_amount,
created_at AS order_date
FROM {{ source('raw', 'orders') }}
WHERE status = 'completed'
Inline Configuration
Use the config() block at the top of SQL files:
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
schema='marts'
) }}
YAML Schema
Top-Level Structure
version: 2
models:
- name: model_name
# ... model configuration
Model Definition
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | - | Model name (must match SQL filename without extension) |
description | string | No | null | Human-readable description |
config | object | No | {} | Model configuration options |
columns | list | No | [] | Column definitions and tests |
meta | object | No | {} | Custom metadata key-value pairs |
Example
version: 2
models:
- name: fct_orders
description: Fact table containing completed orders with enriched attributes
config:
materialized: table
schema: marts
columns:
- name: order_id
description: Primary key
tests:
- unique
- not_null
Configuration Options
The config object supports the following fields:
| Field | Type | Default | Description |
|---|---|---|---|
materialized | string | view | How the model is built: view, table, incremental, ephemeral |
schema | string | null | Target schema override |
tags | list | [] | Tags for organizing and selecting models |
enabled | boolean | true | Whether the model is enabled |
pre_hook | list | [] | SQL statements to run before model creation |
post_hook | list | [] | SQL statements to run after model creation |
unique_key | string or list | null | Unique key column(s) for incremental models |
incremental_strategy | string | null | Strategy for incremental updates |
on_schema_change | string | ignore | Behavior when schema changes |
Materialization Types
| Type | Description |
|---|---|
view | Creates a database view (default) |
table | Creates a physical table, rebuilt on each run |
incremental | Appends or merges new data into existing table |
ephemeral | Inlined as a CTE, not materialized in database |
Incremental Strategies
| Strategy | Description |
|---|---|
append | Insert new rows only |
merge | Upsert based on unique key |
delete+insert | Delete matching rows, then insert |
Schema Change Behavior
| Value | Description |
|---|---|
ignore | Ignore schema changes (default) |
append_new_columns | Add new columns, keep existing |
sync_all_columns | Sync schema to match model |
fail | Fail if schema changes detected |
Configuration Example
models:
- name: fct_orders
config:
materialized: incremental
unique_key: order_id
incremental_strategy: merge
on_schema_change: append_new_columns
schema: marts
tags:
- daily
- revenue
pre_hook:
- "{{ log('Starting fct_orders build', info=True) }}"
post_hook:
- "ANALYZE {{ this }}"
Column Definition
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | - | Column name |
description | string | No | null | Human-readable description |
data_type | string | No | null | SQL data type |
tests | list | No | [] | Tests to run on this column |
meta | object | No | {} | Custom metadata key-value pairs |
Example
columns:
- name: order_id
description: Unique order identifier
data_type: INTEGER
tests:
- unique
- not_null
- name: customer_id
description: Reference to dim_customers
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: order_status
description: Current status of the order
tests:
- accepted_values:
values: ['pending', 'shipped', 'delivered', 'cancelled']
meta:
business_owner: sales_ops
Column Tests
| Test | Description | Parameters |
|---|---|---|
unique | Values must be unique | - |
not_null | Values cannot be null | - |
accepted_values | Values from predefined list | values: list of allowed values |
relationships | Foreign key validation | to: target ref, field: target column |
Template Functions
ref()
Reference another model:
SELECT *
FROM {{ ref('stg_orders') }}
source()
Reference a source table:
SELECT *
FROM {{ source('raw', 'orders') }}
config()
Inline configuration:
{{ config(materialized='table', schema='marts') }}
Complete Example
SQL File: models/marts/fct_orders.sql
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
schema='marts'
) }}
WITH orders AS (
SELECT *
FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
),
customers AS (
SELECT *
FROM {{ ref('dim_customers') }}
)
SELECT
o.order_id,
o.customer_id,
c.customer_name,
c.customer_segment,
o.order_date,
o.total_amount,
o.status,
o.updated_at
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
YAML File: models/marts/marts.yml
version: 2
models:
- name: fct_orders
description: |
Fact table containing all orders with customer attributes.
Updated incrementally based on the updated_at timestamp.
config:
materialized: incremental
unique_key: order_id
tags:
- revenue
- daily
meta:
owner: analytics-team
refresh_frequency: hourly
columns:
- name: order_id
description: Primary key - unique order identifier
tests:
- unique
- not_null
- name: customer_id
description: Foreign key to dim_customers
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: customer_name
description: Customer full name
- name: customer_segment
description: Customer segment classification
tests:
- accepted_values:
values: ['enterprise', 'mid-market', 'smb', 'consumer']
- name: order_date
description: Date when the order was placed
tests:
- not_null
- name: total_amount
description: Total order value in USD
meta:
format: currency
- name: status
description: Current order status
tests:
- accepted_values:
values: ['pending', 'completed', 'cancelled', 'refunded']
- name: updated_at
description: Last update timestamp
tests:
- not_null
- name: dim_customers
description: Customer dimension with current attributes
columns:
- name: customer_id
description: Primary key
tests:
- unique
- not_null
- name: customer_name
description: Customer full name
- name: customer_segment
description: Customer segment classification
Best Practices
-
Use consistent naming: Prefix staging models with
stg_, fact tables withfct_, and dimensions withdim_. -
Document all columns: Provide descriptions for every column in your YAML files.
-
Add tests: Include at minimum
uniqueandnot_nulltests on primary keys. -
Use incremental where possible: For large tables, use incremental materialization to improve performance.
-
Organize by layer: Group models in directories by transformation layer (
staging/,intermediate/,marts/). -
Tag for selection: Use tags to organize models for selective runs (
olytix-core run --select tag:daily).