Data Tests
Data tests validate the quality and integrity of your data. Olytix Core provides built-in tests that run against your models to catch data issues before they impact downstream analytics.
Overview
Data tests are SQL queries that return rows indicating failures. If a test returns zero rows, it passes. Any returned rows represent violations.
┌─────────────────────────────────────────────────────────────────────┐
│ Data Test Flow │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Model Data │
│ ┌──────────────┐ │
│ │ fct_orders │ │
│ │ ─────────────│ │
│ │ order_id │──┐ │
│ │ customer_id │ │ │
│ │ status │ │ │
│ └──────────────┘ │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Data Tests │ │
│ │ ─────────────-│ │
│ │ unique │ │
│ │ not_null │ │
│ │ accepted_vals │ │
│ └───────┬───────┘ │
│ │ │
│ ┌────────────┴────────────┐ │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ PASS │ │ FAIL │ │
│ │ 0 rows │ │ N rows │ │
│ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Built-in Test Types
Olytix Core provides four built-in test types that cover common data quality requirements.
unique
Validates that a column contains no duplicate values. Essential for primary keys and business identifiers.
# models/marts/schema.yml
version: 2
models:
- name: fct_orders
columns:
- name: order_id
description: "Unique order identifier"
tests:
- unique
Generated SQL:
SELECT order_id
FROM analytics.fct_orders
GROUP BY order_id
HAVING COUNT(*) > 1
not_null
Validates that a column contains no NULL values. Critical for required fields and foreign keys.
models:
- name: fct_orders
columns:
- name: order_id
tests:
- not_null
- name: customer_id
description: "Customer who placed the order"
tests:
- not_null
- name: order_date
tests:
- not_null
Generated SQL:
SELECT *
FROM analytics.fct_orders
WHERE order_id IS NULL
accepted_values
Validates that a column only contains values from a specified list. Useful for status fields, categories, and enums.
models:
- name: fct_orders
columns:
- name: status
description: "Order status"
tests:
- accepted_values:
values: ['completed', 'pending', 'cancelled', 'refunded']
Generated SQL:
SELECT *
FROM analytics.fct_orders
WHERE status NOT IN ('completed', 'pending', 'cancelled', 'refunded')
You can also handle NULL values explicitly:
columns:
- name: payment_method
tests:
- accepted_values:
values: ['credit_card', 'debit_card', 'paypal', 'bank_transfer']
quote: true # Quote string values in SQL
relationships
Validates referential integrity between models. Ensures foreign keys reference valid primary keys.
models:
- name: fct_orders
columns:
- name: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
Generated SQL:
SELECT orders.customer_id
FROM analytics.fct_orders AS orders
LEFT JOIN analytics.dim_customers AS customers
ON orders.customer_id = customers.customer_id
WHERE customers.customer_id IS NULL
AND orders.customer_id IS NOT NULL
Complete YAML Configuration Example
Here is a comprehensive example showing multiple tests on a single model:
# models/marts/schema.yml
version: 2
models:
- name: fct_orders
description: "Fact table containing order transactions"
meta:
owner: analytics-team
contains_pii: false
columns:
- name: order_id
description: "Unique order identifier (primary key)"
tests:
- unique
- not_null
- name: customer_id
description: "Foreign key to dim_customers"
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: product_id
description: "Foreign key to dim_products"
tests:
- not_null
- relationships:
to: ref('dim_products')
field: product_id
- name: order_date
description: "Date the order was placed"
tests:
- not_null
- name: status
description: "Current order status"
tests:
- not_null
- accepted_values:
values: ['completed', 'pending', 'cancelled', 'refunded', 'processing']
- name: net_amount
description: "Order total after discounts"
tests:
- not_null
- name: dim_customers
description: "Customer dimension table"
columns:
- name: customer_id
description: "Primary key"
tests:
- unique
- not_null
- name: email
description: "Customer email address"
tests:
- unique
- not_null
- name: region
description: "Geographic region"
tests:
- accepted_values:
values: ['north_america', 'europe', 'asia_pacific', 'latin_america']
Test Severity
Control how test failures are handled using the severity parameter:
columns:
- name: email
tests:
- unique:
severity: error # Fail the build (default)
- not_null:
severity: warn # Log warning but continue
| Severity | Behavior |
|---|---|
error | Test failure stops the build |
warn | Test failure logs a warning but allows build to continue |
Running Tests
Execute data tests using the CLI:
# Run all tests
olytix-core test
# Run tests for specific models
olytix-core test --select fct_orders dim_customers
# Run tests with upstream dependencies
olytix-core test --select +fct_orders
# Run only specific test types
olytix-core test --test-types unique not_null
# Run tests and show detailed output
olytix-core test --verbose
Example Output:
Running data tests...
Testing fct_orders...
✓ unique: order_id (0 failures)
✓ not_null: order_id (0 failures)
✓ not_null: customer_id (0 failures)
✓ relationships: customer_id -> dim_customers.customer_id (0 failures)
✓ accepted_values: status (0 failures)
Testing dim_customers...
✓ unique: customer_id (0 failures)
✓ unique: email (0 failures)
✓ not_null: email (0 failures)
✗ accepted_values: region (3 failures)
Found invalid values: ['unknown', 'other', 'undefined']
Tests completed: 8 passed, 1 failed
Custom Tests
For validation logic not covered by built-in tests, create custom SQL tests:
# tests/data_quality.yml
version: 2
tests:
- name: assert_positive_amounts
description: "Verify all order amounts are positive"
model: fct_orders
sql: |
SELECT order_id, net_amount
FROM {{ ref('fct_orders') }}
WHERE net_amount <= 0
- name: assert_orders_have_products
description: "Every order must have at least one product"
model: fct_orders
sql: |
SELECT o.order_id
FROM {{ ref('fct_orders') }} o
LEFT JOIN {{ ref('fct_order_items') }} oi
ON o.order_id = oi.order_id
GROUP BY o.order_id
HAVING COUNT(oi.product_id) = 0
Best Practices
1. Test Primary Keys
Always test primary keys for uniqueness and non-null values:
columns:
- name: order_id
tests:
- unique
- not_null
2. Test Foreign Keys
Validate referential integrity for all foreign keys:
columns:
- name: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
3. Test Business Rules
Use accepted_values for status fields and categories:
columns:
- name: order_status
tests:
- accepted_values:
values: ['pending', 'confirmed', 'shipped', 'delivered', 'cancelled']
4. Use Warnings for Non-Critical Tests
Apply severity: warn for tests that should not block builds:
columns:
- name: optional_field
tests:
- not_null:
severity: warn
5. Document Test Purpose
Add descriptions to clarify test intent:
columns:
- name: email
description: "Customer email - must be unique across all customers"
tests:
- unique
- not_null
Next Steps
Now that you understand data tests:
- Schema Tests - Validate column types and constraints
- CI/CD Integration - Automate tests in your pipeline