Skip to main content

Data Tests

For Data Analysts

Data tests validate the quality and integrity of your data. Olytix Core provides built-in tests that run against your models to catch data issues before they impact downstream analytics.

Overview

Data tests are SQL queries that return rows indicating failures. If a test returns zero rows, it passes. Any returned rows represent violations.

┌─────────────────────────────────────────────────────────────────────┐
│ Data Test Flow │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Model Data │
│ ┌──────────────┐ │
│ │ fct_orders │ │
│ │ ─────────────│ │
│ │ order_id │──┐ │
│ │ customer_id │ │ │
│ │ status │ │ │
│ └──────────────┘ │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Data Tests │ │
│ │ ─────────────-│ │
│ │ unique │ │
│ │ not_null │ │
│ │ accepted_vals │ │
│ └───────┬───────┘ │
│ │ │
│ ┌────────────┴────────────┐ │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ PASS │ │ FAIL │ │
│ │ 0 rows │ │ N rows │ │
│ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘

Built-in Test Types

Olytix Core provides four built-in test types that cover common data quality requirements.

unique

Validates that a column contains no duplicate values. Essential for primary keys and business identifiers.

# models/marts/schema.yml
version: 2

models:
- name: fct_orders
columns:
- name: order_id
description: "Unique order identifier"
tests:
- unique

Generated SQL:

SELECT order_id
FROM analytics.fct_orders
GROUP BY order_id
HAVING COUNT(*) > 1

not_null

Validates that a column contains no NULL values. Critical for required fields and foreign keys.

models:
- name: fct_orders
columns:
- name: order_id
tests:
- not_null
- name: customer_id
description: "Customer who placed the order"
tests:
- not_null
- name: order_date
tests:
- not_null

Generated SQL:

SELECT *
FROM analytics.fct_orders
WHERE order_id IS NULL

accepted_values

Validates that a column only contains values from a specified list. Useful for status fields, categories, and enums.

models:
- name: fct_orders
columns:
- name: status
description: "Order status"
tests:
- accepted_values:
values: ['completed', 'pending', 'cancelled', 'refunded']

Generated SQL:

SELECT *
FROM analytics.fct_orders
WHERE status NOT IN ('completed', 'pending', 'cancelled', 'refunded')

You can also handle NULL values explicitly:

columns:
- name: payment_method
tests:
- accepted_values:
values: ['credit_card', 'debit_card', 'paypal', 'bank_transfer']
quote: true # Quote string values in SQL

relationships

Validates referential integrity between models. Ensures foreign keys reference valid primary keys.

models:
- name: fct_orders
columns:
- name: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id

Generated SQL:

SELECT orders.customer_id
FROM analytics.fct_orders AS orders
LEFT JOIN analytics.dim_customers AS customers
ON orders.customer_id = customers.customer_id
WHERE customers.customer_id IS NULL
AND orders.customer_id IS NOT NULL

Complete YAML Configuration Example

Here is a comprehensive example showing multiple tests on a single model:

# models/marts/schema.yml
version: 2

models:
- name: fct_orders
description: "Fact table containing order transactions"

meta:
owner: analytics-team
contains_pii: false

columns:
- name: order_id
description: "Unique order identifier (primary key)"
tests:
- unique
- not_null

- name: customer_id
description: "Foreign key to dim_customers"
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id

- name: product_id
description: "Foreign key to dim_products"
tests:
- not_null
- relationships:
to: ref('dim_products')
field: product_id

- name: order_date
description: "Date the order was placed"
tests:
- not_null

- name: status
description: "Current order status"
tests:
- not_null
- accepted_values:
values: ['completed', 'pending', 'cancelled', 'refunded', 'processing']

- name: net_amount
description: "Order total after discounts"
tests:
- not_null

- name: dim_customers
description: "Customer dimension table"

columns:
- name: customer_id
description: "Primary key"
tests:
- unique
- not_null

- name: email
description: "Customer email address"
tests:
- unique
- not_null

- name: region
description: "Geographic region"
tests:
- accepted_values:
values: ['north_america', 'europe', 'asia_pacific', 'latin_america']

Test Severity

Control how test failures are handled using the severity parameter:

columns:
- name: email
tests:
- unique:
severity: error # Fail the build (default)
- not_null:
severity: warn # Log warning but continue
SeverityBehavior
errorTest failure stops the build
warnTest failure logs a warning but allows build to continue

Running Tests

Execute data tests using the CLI:

# Run all tests
olytix-core test

# Run tests for specific models
olytix-core test --select fct_orders dim_customers

# Run tests with upstream dependencies
olytix-core test --select +fct_orders

# Run only specific test types
olytix-core test --test-types unique not_null

# Run tests and show detailed output
olytix-core test --verbose

Example Output:

Running data tests...

Testing fct_orders...
✓ unique: order_id (0 failures)
✓ not_null: order_id (0 failures)
✓ not_null: customer_id (0 failures)
✓ relationships: customer_id -> dim_customers.customer_id (0 failures)
✓ accepted_values: status (0 failures)

Testing dim_customers...
✓ unique: customer_id (0 failures)
✓ unique: email (0 failures)
✓ not_null: email (0 failures)
✗ accepted_values: region (3 failures)
Found invalid values: ['unknown', 'other', 'undefined']

Tests completed: 8 passed, 1 failed

Custom Tests

For validation logic not covered by built-in tests, create custom SQL tests:

# tests/data_quality.yml
version: 2

tests:
- name: assert_positive_amounts
description: "Verify all order amounts are positive"
model: fct_orders
sql: |
SELECT order_id, net_amount
FROM {{ ref('fct_orders') }}
WHERE net_amount <= 0

- name: assert_orders_have_products
description: "Every order must have at least one product"
model: fct_orders
sql: |
SELECT o.order_id
FROM {{ ref('fct_orders') }} o
LEFT JOIN {{ ref('fct_order_items') }} oi
ON o.order_id = oi.order_id
GROUP BY o.order_id
HAVING COUNT(oi.product_id) = 0

Best Practices

1. Test Primary Keys

Always test primary keys for uniqueness and non-null values:

columns:
- name: order_id
tests:
- unique
- not_null

2. Test Foreign Keys

Validate referential integrity for all foreign keys:

columns:
- name: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id

3. Test Business Rules

Use accepted_values for status fields and categories:

columns:
- name: order_status
tests:
- accepted_values:
values: ['pending', 'confirmed', 'shipped', 'delivered', 'cancelled']

4. Use Warnings for Non-Critical Tests

Apply severity: warn for tests that should not block builds:

columns:
- name: optional_field
tests:
- not_null:
severity: warn

5. Document Test Purpose

Add descriptions to clarify test intent:

columns:
- name: email
description: "Customer email - must be unique across all customers"
tests:
- unique
- not_null

Next Steps

Now that you understand data tests:

  1. Schema Tests - Validate column types and constraints
  2. CI/CD Integration - Automate tests in your pipeline