Skip to main content

Your First Model

For Data Analysts

Models are SQL transformations that prepare your raw data for the semantic layer. Olytix Core uses dbt-compatible syntax, making it easy to migrate existing projects or leverage familiar patterns.

What is a Model?

A model is a SQL SELECT statement that transforms raw data into an analytics-ready format:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Sources │ ──► │ Models │ ──► │ Cubes │
│ (raw data) │ │ (transforms) │ │ (semantic) │
└──────────────┘ └──────────────┘ └──────────────┘

Models can:

  • Clean and validate data
  • Join multiple sources
  • Calculate derived fields
  • Apply business logic
  • Create reusable building blocks

Prerequisites

Before creating a model, you need:

  1. A Olytix Core project initialized (olytix-core init)
  2. A source defined in sources/

Step 1: Define a Source

First, tell Olytix Core about your raw data. Create sources/raw.yml:

version: 2

sources:
- name: raw
database: analytics
schema: public
description: "Raw e-commerce data from production database"

tables:
- name: orders
description: "Raw order transactions"
columns:
- name: id
description: "Order primary key"
- name: customer_id
description: "Foreign key to customers"
- name: product_id
description: "Foreign key to products"
- name: quantity
description: "Number of items ordered"
- name: unit_price
description: "Price per unit in USD"
- name: discount_pct
description: "Discount percentage (0-100)"
- name: order_date
description: "Timestamp of order"
- name: status
description: "Order status code"
- name: created_at
description: "Record creation timestamp"

- name: customers
description: "Customer master data"
columns:
- name: id
description: "Customer primary key"
- name: email
description: "Customer email"
- name: name
description: "Customer full name"
- name: region
description: "Geographic region"
- name: created_at
description: "Registration date"

- name: products
description: "Product catalog"
columns:
- name: id
description: "Product primary key"
- name: name
description: "Product name"
- name: category
description: "Product category"
- name: cost
description: "Product cost"

Step 2: Create a Staging Model

Staging models clean and standardize raw data. Create models/staging/stg_orders.sql:

{{ config(materialized='view') }}

-- Staging model: Clean and standardize raw orders
SELECT
-- Rename to consistent naming convention
id AS order_id,
customer_id,
product_id,

-- Data type conversions
CAST(quantity AS INTEGER) AS quantity,
CAST(unit_price AS DECIMAL(10,2)) AS unit_price,

-- Handle nulls and defaults
COALESCE(discount_pct, 0) AS discount_pct,

-- Standardize date format
CAST(order_date AS DATE) AS order_date,

-- Map status codes to readable values
CASE status
WHEN 'C' THEN 'completed'
WHEN 'P' THEN 'pending'
WHEN 'X' THEN 'cancelled'
WHEN 'R' THEN 'refunded'
ELSE 'unknown'
END AS status,

-- Metadata
created_at

FROM {{ source('raw', 'orders') }}

-- Filter out test data
WHERE customer_id IS NOT NULL
AND order_date >= '2020-01-01'

Understanding the Config Block

The {{ config() }} block controls how the model is materialized:

{{ config(
materialized='view', -- 'view', 'table', 'incremental', or 'ephemeral'
schema='staging', -- Target schema (optional)
tags=['staging', 'daily'] -- Tags for organizing models
) }}

Materialization Types

TypeDescriptionUse Case
viewSQL view, always freshStaging models, small data
tablePhysical table, rebuilds fullyMart tables, aggregations
incrementalAppends new data onlyLarge fact tables, event data
ephemeralCTE, not materializedIntermediate transformations

Step 3: Create a Mart Model

Mart models combine and enrich data for analytics. Create models/marts/fct_orders.sql:

{{ config(materialized='table') }}

-- Fact table: Order transactions with enriched data
WITH orders AS (
SELECT * FROM {{ ref('stg_orders') }}
),

customers AS (
SELECT
id AS customer_id,
name AS customer_name,
email AS customer_email,
region AS customer_region,
created_at AS customer_since
FROM {{ source('raw', 'customers') }}
),

products AS (
SELECT
id AS product_id,
name AS product_name,
category AS product_category,
cost AS product_cost
FROM {{ source('raw', 'products') }}
)

SELECT
-- Order fields
o.order_id,
o.order_date,
o.status,
o.quantity,
o.unit_price,
o.discount_pct,

-- Calculated fields
o.quantity * o.unit_price AS gross_amount,
o.quantity * o.unit_price * (1 - o.discount_pct / 100) AS net_amount,
o.quantity * o.unit_price * (o.discount_pct / 100) AS discount_amount,

-- Customer fields
o.customer_id,
c.customer_name,
c.customer_region,

-- Product fields
o.product_id,
p.product_name,
p.product_category,

-- Profit calculation
(o.quantity * o.unit_price * (1 - o.discount_pct / 100)) - (o.quantity * p.product_cost) AS profit,

-- Metadata
o.created_at

FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
LEFT JOIN products p ON o.product_id = p.product_id

Step 4: Using ref() and source()

Olytix Core provides two key functions for referencing data:

source()

References raw tables defined in your sources:

-- References 'orders' table from 'raw' source
SELECT * FROM {{ source('raw', 'orders') }}

ref()

References other models, creating automatic dependencies:

-- References the stg_orders model
SELECT * FROM {{ ref('stg_orders') }}

The dependency graph is built automatically:

source('raw', 'orders') ──► ref('stg_orders') ──► ref('fct_orders')

Step 5: Add Model Schema

Document your model with a schema file. Create models/marts/schema.yml:

version: 2

models:
- name: fct_orders
description: "Fact table containing order transactions with customer and product enrichment"

meta:
owner: analytics-team
contains_pii: true

columns:
- name: order_id
description: "Unique order identifier"
tests:
- unique
- not_null

- name: order_date
description: "Date the order was placed"
tests:
- not_null

- name: status
description: "Order status (completed, pending, cancelled, refunded)"
tests:
- accepted_values:
values: ['completed', 'pending', 'cancelled', 'refunded', 'unknown']

- name: net_amount
description: "Order amount after discounts"
tests:
- not_null

- name: customer_id
description: "Customer who placed the order"
tests:
- not_null
- relationships:
to: source('raw', 'customers')
field: id

Step 6: Create an Incremental Model

For large tables, use incremental models. Create models/marts/fct_events.sql:

{{ config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='merge'
) }}

SELECT
event_id,
user_id,
event_type,
event_properties,
created_at

FROM {{ source('raw', 'events') }}

{% if is_incremental() %}
-- Only process new events on incremental runs
WHERE created_at > (SELECT MAX(created_at) FROM {{ this }})
{% endif %}

Incremental Strategies

StrategyDescriptionBest For
appendInsert new rows onlyEvent logs, immutable data
mergeUpsert based on unique keySlowly changing data
delete+insertDelete matching, then insertLarge batch updates

Step 7: Compile and Run

Compile your models to validate them:

olytix-core compile

Output:

Compiling project...
✓ Loaded 1 source with 3 tables
✓ Compiled 3 models:
- stg_orders (view)
- fct_orders (table)
- fct_events (incremental)
✓ Built dependency graph
✓ Generated manifest at target/manifest.json

Compilation completed in 0.52s

Run the models to materialize them:

# Run all models
olytix-core run

# Run specific model
olytix-core run --select fct_orders

# Run model and its dependencies
olytix-core run --select +fct_orders

# Run model and its dependents
olytix-core run --select fct_orders+

Step 8: Using Jinja for Dynamic SQL

Olytix Core supports Jinja templating for dynamic SQL:

Variables

{% set date_column = 'order_date' %}
{% set lookback_days = 30 %}

SELECT *
FROM {{ ref('stg_orders') }}
WHERE {{ date_column }} >= CURRENT_DATE - INTERVAL '{{ lookback_days }} days'

Conditionals

SELECT
order_id,
{% if target.name == 'prod' %}
customer_email -- Include PII in prod
{% else %}
'REDACTED' AS customer_email -- Mask in dev
{% endif %}
FROM {{ ref('fct_orders') }}

Loops

SELECT
order_id,
{% for status in ['completed', 'pending', 'cancelled'] %}
SUM(CASE WHEN status = '{{ status }}' THEN 1 ELSE 0 END) AS {{ status }}_count
{% if not loop.last %},{% endif %}
{% endfor %}
FROM {{ ref('fct_orders') }}
GROUP BY order_id

Macros

Create reusable SQL snippets. Create macros/date_spine.sql:

{% macro date_spine(start_date, end_date) %}
WITH RECURSIVE dates AS (
SELECT DATE '{{ start_date }}' AS date_day
UNION ALL
SELECT date_day + INTERVAL '1 day'
FROM dates
WHERE date_day < DATE '{{ end_date }}'
)
SELECT date_day FROM dates
{% endmacro %}

Use in models:

SELECT
d.date_day,
COALESCE(o.order_count, 0) AS order_count
FROM ({{ date_spine('2024-01-01', '2024-12-31') }}) d
LEFT JOIN (
SELECT order_date, COUNT(*) AS order_count
FROM {{ ref('fct_orders') }}
GROUP BY order_date
) o ON d.date_day = o.order_date

Project Structure

Your models folder should look like this:

models/
├── staging/
│ ├── stg_orders.sql
│ ├── stg_customers.sql
│ └── stg_products.sql
├── marts/
│ ├── fct_orders.sql
│ ├── fct_events.sql
│ ├── dim_customers.sql
│ └── schema.yml
└── intermediate/
└── int_order_items.sql

Best Practices

Naming Conventions

  • stg_ prefix for staging models
  • int_ prefix for intermediate models
  • fct_ prefix for fact tables
  • dim_ prefix for dimension tables

Model Layers

  1. Staging: Clean and rename raw data
  2. Intermediate: Complex joins and business logic
  3. Marts: Final tables for analytics

Performance Tips

  • Use view for simple transformations
  • Use table for complex aggregations
  • Use incremental for large, append-only data
  • Add indexes on frequently filtered columns

Next Steps

Now that you've created models:

  1. Understand project structure →
  2. Learn about materializations →
  3. Set up incremental models →
  4. Add data tests →