Skip to main content

Model Basics

For Data Analysts

Models are SQL transformations that convert raw data into analytics-ready tables. Olytix Core models use Jinja templating for dynamic SQL generation and dependency management.

What is a Model?

A model is a SQL SELECT statement stored in a .sql file within the models/ directory:

models/
├── staging/
│ ├── stg_orders.sql
│ └── stg_customers.sql
├── intermediate/
│ └── int_order_items.sql
└── marts/
├── fct_orders.sql
└── dim_customers.sql

When you run olytix-core run, each model compiles to SQL and executes against your warehouse, creating a view or table.

Basic Model Structure

A model consists of an optional config block followed by SQL:

{{ config(materialized='table') }}

SELECT
order_id,
customer_id,
order_date,
total_amount
FROM {{ source('raw', 'orders') }}
WHERE status = 'completed'

The Config Block

The {{ config() }} block defines model behavior:

{{ config(
materialized='table',
schema='analytics',
alias='orders_fact',
tags=['daily', 'finance'],
enabled=true
) }}

Config Options

OptionTypeDescription
materializedstringHow to persist the model (view, table, incremental, ephemeral)
schemastringTarget schema override
aliasstringOverride the table/view name
tagslistTags for selection and organization
enabledbooleanWhether to include in runs (default: true)
pre_hooklistSQL to run before model
post_hooklistSQL to run after model

Config Inheritance

Set defaults in olytix-core_project.yml:

models:
my_project:
+materialized: view
staging:
+schema: staging
marts:
+materialized: table
+schema: analytics

File-level config overrides project defaults:

-- This model uses 'incremental' despite project default of 'table'
{{ config(materialized='incremental') }}

Jinja Templating

Olytix Core uses Jinja2 for dynamic SQL generation. Jinja expressions are enclosed in {{ }} for output and {% %} for logic.

Variables

Define and use variables:

{% set order_statuses = ['completed', 'shipped', 'delivered'] %}
{% set lookback_days = 90 %}

SELECT
order_id,
status,
created_at
FROM {{ ref('stg_orders') }}
WHERE status IN (
{% for status in order_statuses %}
'{{ status }}'{% if not loop.last %},{% endif %}
{% endfor %}
)
AND created_at >= CURRENT_DATE - INTERVAL '{{ lookback_days }} days'

Project Variables

Access variables defined in olytix-core_project.yml:

# olytix-core_project.yml
vars:
fiscal_year_start: '04-01'
default_currency: 'USD'
SELECT
*,
'{{ var("default_currency") }}' AS currency
FROM {{ ref('fct_orders') }}

Conditionals

Use if/else for conditional SQL:

SELECT
order_id,
customer_id,
{% if target.name == 'prod' %}
customer_email,
customer_phone
{% else %}
'REDACTED' AS customer_email,
'REDACTED' AS customer_phone
{% endif %}
FROM {{ ref('stg_customers') }}

Loops

Generate repetitive SQL:

{% set metrics = ['revenue', 'orders', 'customers'] %}

SELECT
date_month,
{% for metric in metrics %}
SUM({{ metric }}) AS total_{{ metric }},
AVG({{ metric }}) AS avg_{{ metric }}{% if not loop.last %},{% endif %}
{% endfor %}
FROM {{ ref('fct_monthly_summary') }}
GROUP BY date_month

The ref() Function

The ref() function references other models, creating automatic dependencies:

SELECT
o.order_id,
o.customer_id,
c.customer_name,
c.customer_region
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
ON o.customer_id = c.customer_id

How ref() Works

  1. Dependency tracking: Olytix Core builds a DAG from ref() calls
  2. Execution order: Models run after their dependencies
  3. Schema resolution: ref() resolves to the correct schema at runtime

Cross-Project References

Reference models from other projects:

SELECT * FROM {{ ref('shared_project', 'dim_date') }}

The source() Function

The source() function references raw tables defined in sources/*.yml:

SELECT
id AS order_id,
customer_id,
amount,
created_at AS order_date
FROM {{ source('raw', 'orders') }}

Source Definition

Define sources in YAML:

# sources/raw.yml
sources:
- name: raw
database: production
schema: public
tables:
- name: orders
columns:
- name: id
- name: customer_id
- name: amount
- name: created_at

source() vs ref()

FunctionUse CaseCreates Dependency
source()Reference raw external tablesSource -> Model
ref()Reference other Olytix Core modelsModel -> Model

Model Layers

Organize models into logical layers:

Staging Layer

One-to-one with sources. Clean, rename, and type-cast:

-- models/staging/stg_orders.sql
{{ config(materialized='view') }}

SELECT
id AS order_id,
customer_id,
CAST(amount AS DECIMAL(10,2)) AS order_amount,
CAST(created_at AS TIMESTAMP) AS order_timestamp,
CASE status
WHEN 'C' THEN 'completed'
WHEN 'P' THEN 'pending'
WHEN 'X' THEN 'cancelled'
ELSE 'unknown'
END AS order_status
FROM {{ source('raw', 'orders') }}
WHERE id IS NOT NULL

Intermediate Layer

Complex transformations and business logic:

-- models/intermediate/int_order_items_enriched.sql
{{ config(materialized='ephemeral') }}

SELECT
oi.order_id,
oi.product_id,
oi.quantity,
oi.unit_price,
oi.quantity * oi.unit_price AS line_total,
p.product_name,
p.category AS product_category,
p.cost AS product_cost
FROM {{ ref('stg_order_items') }} oi
LEFT JOIN {{ ref('stg_products') }} p
ON oi.product_id = p.product_id

Marts Layer

Final fact and dimension tables for analytics:

-- models/marts/fct_orders.sql
{{ config(materialized='table') }}

SELECT
o.order_id,
o.order_timestamp,
o.order_status,
o.customer_id,
c.customer_name,
c.customer_region,
SUM(oi.line_total) AS order_total,
SUM(oi.quantity * oi.product_cost) AS order_cost,
SUM(oi.line_total) - SUM(oi.quantity * oi.product_cost) AS order_profit,
COUNT(DISTINCT oi.product_id) AS unique_products
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
ON o.customer_id = c.customer_id
LEFT JOIN {{ ref('int_order_items_enriched') }} oi
ON o.order_id = oi.order_id
GROUP BY
o.order_id,
o.order_timestamp,
o.order_status,
o.customer_id,
c.customer_name,
c.customer_region

Hooks

Execute SQL before or after model runs:

{{ config(
materialized='table',
pre_hook=[
"DELETE FROM {{ this }} WHERE order_date < CURRENT_DATE - INTERVAL '2 years'"
],
post_hook=[
"CREATE INDEX IF NOT EXISTS idx_order_date ON {{ this }} (order_date)",
"ANALYZE {{ this }}"
]
) }}

SELECT * FROM {{ ref('stg_orders') }}

The this Variable

{{ this }} refers to the current model's database object:

{{ config(
post_hook="GRANT SELECT ON {{ this }} TO analytics_role"
) }}

Model Documentation

Document models in a schema.yml file:

# models/marts/schema.yml
version: 2

models:
- name: fct_orders
description: |
Fact table containing completed orders with customer
and product enrichment. Grain is one row per order.

meta:
owner: analytics-team
refresh: daily

columns:
- name: order_id
description: Primary key
tests:
- unique
- not_null

- name: order_total
description: Sum of all line items after discounts
tests:
- not_null

Best Practices

Naming Conventions

LayerPrefixExample
Stagingstg_stg_orders
Intermediateint_int_order_items_enriched
Factfct_fct_orders
Dimensiondim_dim_customers

SQL Style

  • Use lowercase for SQL keywords
  • Use meaningful aliases
  • Format for readability
-- Good
SELECT
o.order_id,
o.customer_id,
c.customer_name
FROM {{ ref('stg_orders') }} o
LEFT JOIN {{ ref('stg_customers') }} c
ON o.customer_id = c.customer_id

-- Avoid
SELECT o.order_id, o.customer_id, c.customer_name FROM {{ref('stg_orders')}} o LEFT JOIN {{ref('stg_customers')}} c ON o.customer_id = c.customer_id

Keep Models Focused

Each model should do one thing well:

  • Staging: Clean and rename only
  • Intermediate: One transformation concern
  • Marts: Final business entity

Next Steps

  1. Learn about materializations - Understand view, table, incremental, and ephemeral
  2. Set up incremental models - Handle large datasets efficiently
  3. Define cubes - Build semantic layer on top of models