Your First Model
Models are SQL transformations that prepare your raw data for the semantic layer. Olytix Core uses dbt-compatible syntax, making it easy to migrate existing projects or leverage familiar patterns.
What is a Model?
A model is a SQL SELECT statement that transforms raw data into an analytics-ready format:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Sources │ ──► │ Models │ ──► │ Cubes │
│ (raw data) │ │ (transforms) │ │ (semantic) │
└──────────────┘ └──────────────┘ └──────────────┘
Models can:
- Clean and validate data
- Join multiple sources
- Calculate derived fields
- Apply business logic
- Create reusable building blocks
Prerequisites
Before creating a model, you need:
- A Olytix Core project initialized (
olytix-core init) - A source defined in
sources/
Step 1: Define a Source
First, tell Olytix Core about your raw data. Create sources/raw.yml:
version: 2
sources:
- name: raw
database: analytics
schema: public
description: "Raw e-commerce data from production database"
tables:
- name: orders
description: "Raw order transactions"
columns:
- name: id
description: "Order primary key"
- name: customer_id
description: "Foreign key to customers"
- name: product_id
description: "Foreign key to products"
- name: quantity
description: "Number of items ordered"
- name: unit_price
description: "Price per unit in USD"
- name: discount_pct
description: "Discount percentage (0-100)"
- name: order_date
description: "Timestamp of order"
- name: status
description: "Order status code"
- name: created_at
description: "Record creation timestamp"
- name: customers
description: "Customer master data"
columns:
- name: id
description: "Customer primary key"
- name: email
description: "Customer email"
- name: name
description: "Customer full name"
- name: region
description: "Geographic region"
- name: created_at
description: "Registration date"
- name: products
description: "Product catalog"
columns:
- name: id
description: "Product primary key"
- name: name
description: "Product name"
- name: category
description: "Product category"
- name: cost
description: "Product cost"
Step 2: Create a Staging Model
Staging models clean and standardize raw data. Create models/staging/stg_orders.sql:
{{ config(materialized='view') }}
-- Staging model: Clean and standardize raw orders
SELECT
-- Rename to consistent naming convention
id AS order_id,
customer_id,
product_id,
-- Data type conversions
CAST(quantity AS INTEGER) AS quantity,
CAST(unit_price AS DECIMAL(10,2)) AS unit_price,
-- Handle nulls and defaults
COALESCE(discount_pct, 0) AS discount_pct,
-- Standardize date format
CAST(order_date AS DATE) AS order_date,
-- Map status codes to readable values
CASE status
WHEN 'C' THEN 'completed'
WHEN 'P' THEN 'pending'
WHEN 'X' THEN 'cancelled'
WHEN 'R' THEN 'refunded'
ELSE 'unknown'
END AS status,
-- Metadata
created_at
FROM {{ source('raw', 'orders') }}
-- Filter out test data
WHERE customer_id IS NOT NULL
AND order_date >= '2020-01-01'
Understanding the Config Block
The {{ config() }} block controls how the model is materialized:
{{ config(
materialized='view', -- 'view', 'table', 'incremental', or 'ephemeral'
schema='staging', -- Target schema (optional)
tags=['staging', 'daily'] -- Tags for organizing models
) }}
Materialization Types
| Type | Description | Use Case |
|---|---|---|
view | SQL view, always fresh | Staging models, small data |
table | Physical table, rebuilds fully | Mart tables, aggregations |
incremental | Appends new data only | Large fact tables, event data |
ephemeral | CTE, not materialized | Intermediate transformations |
Step 3: Create a Mart Model
Mart models combine and enrich data for analytics. Create models/marts/fct_orders.sql:
{{ config(materialized='table') }}
-- Fact table: Order transactions with enriched data
WITH orders AS (
SELECT * FROM {{ ref('stg_orders') }}
),
customers AS (
SELECT
id AS customer_id,
name AS customer_name,
email AS customer_email,
region AS customer_region,
created_at AS customer_since
FROM {{ source('raw', 'customers') }}
),
products AS (
SELECT
id AS product_id,
name AS product_name,
category AS product_category,
cost AS product_cost
FROM {{ source('raw', 'products') }}
)
SELECT
-- Order fields
o.order_id,
o.order_date,
o.status,
o.quantity,
o.unit_price,
o.discount_pct,
-- Calculated fields
o.quantity * o.unit_price AS gross_amount,
o.quantity * o.unit_price * (1 - o.discount_pct / 100) AS net_amount,
o.quantity * o.unit_price * (o.discount_pct / 100) AS discount_amount,
-- Customer fields
o.customer_id,
c.customer_name,
c.customer_region,
-- Product fields
o.product_id,
p.product_name,
p.product_category,
-- Profit calculation
(o.quantity * o.unit_price * (1 - o.discount_pct / 100)) - (o.quantity * p.product_cost) AS profit,
-- Metadata
o.created_at
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
LEFT JOIN products p ON o.product_id = p.product_id
Step 4: Using ref() and source()
Olytix Core provides two key functions for referencing data:
source()
References raw tables defined in your sources:
-- References 'orders' table from 'raw' source
SELECT * FROM {{ source('raw', 'orders') }}
ref()
References other models, creating automatic dependencies:
-- References the stg_orders model
SELECT * FROM {{ ref('stg_orders') }}
The dependency graph is built automatically:
source('raw', 'orders') ──► ref('stg_orders') ──► ref('fct_orders')
Step 5: Add Model Schema
Document your model with a schema file. Create models/marts/schema.yml:
version: 2
models:
- name: fct_orders
description: "Fact table containing order transactions with customer and product enrichment"
meta:
owner: analytics-team
contains_pii: true
columns:
- name: order_id
description: "Unique order identifier"
tests:
- unique
- not_null
- name: order_date
description: "Date the order was placed"
tests:
- not_null
- name: status
description: "Order status (completed, pending, cancelled, refunded)"
tests:
- accepted_values:
values: ['completed', 'pending', 'cancelled', 'refunded', 'unknown']
- name: net_amount
description: "Order amount after discounts"
tests:
- not_null
- name: customer_id
description: "Customer who placed the order"
tests:
- not_null
- relationships:
to: source('raw', 'customers')
field: id
Step 6: Create an Incremental Model
For large tables, use incremental models. Create models/marts/fct_events.sql:
{{ config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='merge'
) }}
SELECT
event_id,
user_id,
event_type,
event_properties,
created_at
FROM {{ source('raw', 'events') }}
{% if is_incremental() %}
-- Only process new events on incremental runs
WHERE created_at > (SELECT MAX(created_at) FROM {{ this }})
{% endif %}
Incremental Strategies
| Strategy | Description | Best For |
|---|---|---|
append | Insert new rows only | Event logs, immutable data |
merge | Upsert based on unique key | Slowly changing data |
delete+insert | Delete matching, then insert | Large batch updates |
Step 7: Compile and Run
Compile your models to validate them:
olytix-core compile
Output:
Compiling project...
✓ Loaded 1 source with 3 tables
✓ Compiled 3 models:
- stg_orders (view)
- fct_orders (table)
- fct_events (incremental)
✓ Built dependency graph
✓ Generated manifest at target/manifest.json
Compilation completed in 0.52s
Run the models to materialize them:
# Run all models
olytix-core run
# Run specific model
olytix-core run --select fct_orders
# Run model and its dependencies
olytix-core run --select +fct_orders
# Run model and its dependents
olytix-core run --select fct_orders+
Step 8: Using Jinja for Dynamic SQL
Olytix Core supports Jinja templating for dynamic SQL:
Variables
{% set date_column = 'order_date' %}
{% set lookback_days = 30 %}
SELECT *
FROM {{ ref('stg_orders') }}
WHERE {{ date_column }} >= CURRENT_DATE - INTERVAL '{{ lookback_days }} days'
Conditionals
SELECT
order_id,
{% if target.name == 'prod' %}
customer_email -- Include PII in prod
{% else %}
'REDACTED' AS customer_email -- Mask in dev
{% endif %}
FROM {{ ref('fct_orders') }}
Loops
SELECT
order_id,
{% for status in ['completed', 'pending', 'cancelled'] %}
SUM(CASE WHEN status = '{{ status }}' THEN 1 ELSE 0 END) AS {{ status }}_count
{% if not loop.last %},{% endif %}
{% endfor %}
FROM {{ ref('fct_orders') }}
GROUP BY order_id
Macros
Create reusable SQL snippets. Create macros/date_spine.sql:
{% macro date_spine(start_date, end_date) %}
WITH RECURSIVE dates AS (
SELECT DATE '{{ start_date }}' AS date_day
UNION ALL
SELECT date_day + INTERVAL '1 day'
FROM dates
WHERE date_day < DATE '{{ end_date }}'
)
SELECT date_day FROM dates
{% endmacro %}
Use in models:
SELECT
d.date_day,
COALESCE(o.order_count, 0) AS order_count
FROM ({{ date_spine('2024-01-01', '2024-12-31') }}) d
LEFT JOIN (
SELECT order_date, COUNT(*) AS order_count
FROM {{ ref('fct_orders') }}
GROUP BY order_date
) o ON d.date_day = o.order_date
Project Structure
Your models folder should look like this:
models/
├── staging/
│ ├── stg_orders.sql
│ ├── stg_customers.sql
│ └── stg_products.sql
├── marts/
│ ├── fct_orders.sql
│ ├── fct_events.sql
│ ├── dim_customers.sql
│ └── schema.yml
└── intermediate/
└── int_order_items.sql
Best Practices
Naming Conventions
stg_prefix for staging modelsint_prefix for intermediate modelsfct_prefix for fact tablesdim_prefix for dimension tables
Model Layers
- Staging: Clean and rename raw data
- Intermediate: Complex joins and business logic
- Marts: Final tables for analytics
Performance Tips
- Use
viewfor simple transformations - Use
tablefor complex aggregations - Use
incrementalfor large, append-only data - Add indexes on frequently filtered columns
Next Steps
Now that you've created models: