Model Basics
Models are SQL transformations that convert raw data into analytics-ready tables. Olytix Core models use Jinja templating for dynamic SQL generation and dependency management.
What is a Model?
A model is a SQL SELECT statement stored in a .sql file within the models/ directory:
models/
├── staging/
│ ├── stg_orders.sql
│ └── stg_customers.sql
├── intermediate/
│ └── int_order_items.sql
└── marts/
├── fct_orders.sql
└── dim_customers.sql
When you run olytix-core run, each model compiles to SQL and executes against your warehouse, creating a view or table.
Basic Model Structure
A model consists of an optional config block followed by SQL:
{{ config(materialized='table') }}
SELECT
order_id,
customer_id,
order_date,
total_amount
FROM {{ source('raw', 'orders') }}
WHERE status = 'completed'
The Config Block
The {{ config() }} block defines model behavior:
{{ config(
materialized='table',
schema='analytics',
alias='orders_fact',
tags=['daily', 'finance'],
enabled=true
) }}
Config Options
| Option | Type | Description |
|---|---|---|
materialized | string | How to persist the model (view, table, incremental, ephemeral) |
schema | string | Target schema override |
alias | string | Override the table/view name |
tags | list | Tags for selection and organization |
enabled | boolean | Whether to include in runs (default: true) |
pre_hook | list | SQL to run before model |
post_hook | list | SQL to run after model |
Config Inheritance
Set defaults in olytix-core_project.yml:
models:
my_project:
+materialized: view
staging:
+schema: staging
marts:
+materialized: table
+schema: analytics
File-level config overrides project defaults:
-- This model uses 'incremental' despite project default of 'table'
{{ config(materialized='incremental') }}
Jinja Templating
Olytix Core uses Jinja2 for dynamic SQL generation. Jinja expressions are enclosed in {{ }} for output and {% %} for logic.
Variables
Define and use variables:
{% set order_statuses = ['completed', 'shipped', 'delivered'] %}
{% set lookback_days = 90 %}
SELECT
order_id,
status,
created_at
FROM {{ ref('stg_orders') }}
WHERE status IN (
{% for status in order_statuses %}
'{{ status }}'{% if not loop.last %},{% endif %}
{% endfor %}
)
AND created_at >= CURRENT_DATE - INTERVAL '{{ lookback_days }} days'
Project Variables
Access variables defined in olytix-core_project.yml:
# olytix-core_project.yml
vars:
fiscal_year_start: '04-01'
default_currency: 'USD'
SELECT
*,
'{{ var("default_currency") }}' AS currency
FROM {{ ref('fct_orders') }}
Conditionals
Use if/else for conditional SQL:
SELECT
order_id,
customer_id,
{% if target.name == 'prod' %}
customer_email,
customer_phone
{% else %}
'REDACTED' AS customer_email,
'REDACTED' AS customer_phone
{% endif %}
FROM {{ ref('stg_customers') }}
Loops
Generate repetitive SQL:
{% set metrics = ['revenue', 'orders', 'customers'] %}
SELECT
date_month,
{% for metric in metrics %}
SUM({{ metric }}) AS total_{{ metric }},
AVG({{ metric }}) AS avg_{{ metric }}{% if not loop.last %},{% endif %}
{% endfor %}
FROM {{ ref('fct_monthly_summary') }}
GROUP BY date_month
The ref() Function
The ref() function references other models, creating automatic dependencies:
SELECT
o.order_id,
o.customer_id,
c.customer_name,
c.customer_region
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
ON o.customer_id = c.customer_id
How ref() Works
- Dependency tracking: Olytix Core builds a DAG from ref() calls
- Execution order: Models run after their dependencies
- Schema resolution: ref() resolves to the correct schema at runtime
Cross-Project References
Reference models from other projects:
SELECT * FROM {{ ref('shared_project', 'dim_date') }}
The source() Function
The source() function references raw tables defined in sources/*.yml:
SELECT
id AS order_id,
customer_id,
amount,
created_at AS order_date
FROM {{ source('raw', 'orders') }}
Source Definition
Define sources in YAML:
# sources/raw.yml
sources:
- name: raw
database: production
schema: public
tables:
- name: orders
columns:
- name: id
- name: customer_id
- name: amount
- name: created_at
source() vs ref()
| Function | Use Case | Creates Dependency |
|---|---|---|
source() | Reference raw external tables | Source -> Model |
ref() | Reference other Olytix Core models | Model -> Model |
Model Layers
Organize models into logical layers:
Staging Layer
One-to-one with sources. Clean, rename, and type-cast:
-- models/staging/stg_orders.sql
{{ config(materialized='view') }}
SELECT
id AS order_id,
customer_id,
CAST(amount AS DECIMAL(10,2)) AS order_amount,
CAST(created_at AS TIMESTAMP) AS order_timestamp,
CASE status
WHEN 'C' THEN 'completed'
WHEN 'P' THEN 'pending'
WHEN 'X' THEN 'cancelled'
ELSE 'unknown'
END AS order_status
FROM {{ source('raw', 'orders') }}
WHERE id IS NOT NULL
Intermediate Layer
Complex transformations and business logic:
-- models/intermediate/int_order_items_enriched.sql
{{ config(materialized='ephemeral') }}
SELECT
oi.order_id,
oi.product_id,
oi.quantity,
oi.unit_price,
oi.quantity * oi.unit_price AS line_total,
p.product_name,
p.category AS product_category,
p.cost AS product_cost
FROM {{ ref('stg_order_items') }} oi
LEFT JOIN {{ ref('stg_products') }} p
ON oi.product_id = p.product_id
Marts Layer
Final fact and dimension tables for analytics:
-- models/marts/fct_orders.sql
{{ config(materialized='table') }}
SELECT
o.order_id,
o.order_timestamp,
o.order_status,
o.customer_id,
c.customer_name,
c.customer_region,
SUM(oi.line_total) AS order_total,
SUM(oi.quantity * oi.product_cost) AS order_cost,
SUM(oi.line_total) - SUM(oi.quantity * oi.product_cost) AS order_profit,
COUNT(DISTINCT oi.product_id) AS unique_products
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
ON o.customer_id = c.customer_id
LEFT JOIN {{ ref('int_order_items_enriched') }} oi
ON o.order_id = oi.order_id
GROUP BY
o.order_id,
o.order_timestamp,
o.order_status,
o.customer_id,
c.customer_name,
c.customer_region
Hooks
Execute SQL before or after model runs:
{{ config(
materialized='table',
pre_hook=[
"DELETE FROM {{ this }} WHERE order_date < CURRENT_DATE - INTERVAL '2 years'"
],
post_hook=[
"CREATE INDEX IF NOT EXISTS idx_order_date ON {{ this }} (order_date)",
"ANALYZE {{ this }}"
]
) }}
SELECT * FROM {{ ref('stg_orders') }}
The this Variable
{{ this }} refers to the current model's database object:
{{ config(
post_hook="GRANT SELECT ON {{ this }} TO analytics_role"
) }}
Model Documentation
Document models in a schema.yml file:
# models/marts/schema.yml
version: 2
models:
- name: fct_orders
description: |
Fact table containing completed orders with customer
and product enrichment. Grain is one row per order.
meta:
owner: analytics-team
refresh: daily
columns:
- name: order_id
description: Primary key
tests:
- unique
- not_null
- name: order_total
description: Sum of all line items after discounts
tests:
- not_null
Best Practices
Naming Conventions
| Layer | Prefix | Example |
|---|---|---|
| Staging | stg_ | stg_orders |
| Intermediate | int_ | int_order_items_enriched |
| Fact | fct_ | fct_orders |
| Dimension | dim_ | dim_customers |
SQL Style
- Use lowercase for SQL keywords
- Use meaningful aliases
- Format for readability
-- Good
SELECT
o.order_id,
o.customer_id,
c.customer_name
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
ON o.customer_id = c.customer_id
-- Avoid
SELECT o.order_id, o.customer_id, c.customer_name FROM {{ref('stg_orders')}} o LEFT JOIN {{ref('stg_customers')}} c ON o.customer_id = c.customer_id
Keep Models Focused
Each model should do one thing well:
- Staging: Clean and rename only
- Intermediate: One transformation concern
- Marts: Final business entity
Next Steps
- Learn about materializations - Understand view, table, incremental, and ephemeral
- Set up incremental models - Handle large datasets efficiently
- Define cubes - Build semantic layer on top of models