Schema Tests
Schema tests validate the structure and types of your data, ensuring columns match expected definitions and constraints are enforced.
Overview
While data tests validate values, schema tests validate structure:
| Test Type | Purpose |
|---|---|
| Data Tests | Value correctness (unique, not_null, accepted_values) |
| Schema Tests | Structure correctness (types, constraints, column existence) |
┌─────────────────────────────────────────────────────────────────────┐
│ Schema Validation │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Expected Schema Actual Schema │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ order_id: INTEGER│ ─► │ order_id: INTEGER│ ✓ Match │
│ │ amount: DECIMAL │ ─► │ amount: VARCHAR │ ✗ Mismatch │
│ │ created: DATE │ ─► │ created: DATE │ ✓ Match │
│ └──────────────────┘ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Column Type Validation
Define expected column types in your schema YAML:
# models/marts/schema.yml
version: 2
models:
- name: fct_orders
description: "Order fact table with type validation"
columns:
- name: order_id
description: "Primary key"
data_type: integer
tests:
- unique
- not_null
- name: customer_id
description: "Foreign key to customers"
data_type: integer
tests:
- not_null
- name: order_date
description: "Date order was placed"
data_type: date
tests:
- not_null
- name: net_amount
description: "Order total after discounts"
data_type: decimal(10,2)
tests:
- not_null
- name: status
description: "Order status"
data_type: varchar(50)
tests:
- accepted_values:
values: ['completed', 'pending', 'cancelled']
- name: is_active
description: "Whether order is active"
data_type: boolean
- name: metadata
description: "Additional order metadata"
data_type: jsonb
Supported Data Types
Olytix Core supports standard SQL data types across different warehouses:
Numeric Types
| Type | Description | Example |
|---|---|---|
integer | Whole numbers | order_count |
bigint | Large integers | event_id |
smallint | Small integers | quantity |
decimal(p,s) | Exact numeric | amount |
numeric(p,s) | Same as decimal | price |
float | Floating point | percentage |
double | Double precision | score |
String Types
| Type | Description | Example |
|---|---|---|
varchar(n) | Variable-length string | name |
char(n) | Fixed-length string | country_code |
text | Unlimited text | description |
Date/Time Types
| Type | Description | Example |
|---|---|---|
date | Date only | order_date |
time | Time only | event_time |
timestamp | Date and time | created_at |
timestamptz | Timestamp with timezone | event_at |
interval | Time interval | duration |
Other Types
| Type | Description | Example |
|---|---|---|
boolean | True/false | is_active |
uuid | UUID values | user_uuid |
json | JSON data | properties |
jsonb | Binary JSON | metadata |
array | Array values | tags |
Column Constraints
Define constraints to enforce data integrity rules:
models:
- name: dim_products
columns:
- name: product_id
data_type: integer
constraints:
- primary_key
tests:
- unique
- not_null
- name: sku
data_type: varchar(50)
constraints:
- unique
- not_null
- name: price
data_type: decimal(10,2)
constraints:
- not_null
- check: "price >= 0"
- name: category_id
data_type: integer
constraints:
- foreign_key:
to: dim_categories
field: category_id
- name: quantity
data_type: integer
constraints:
- check: "quantity >= 0"
Constraint Types
| Constraint | Description | Usage |
|---|---|---|
primary_key | Unique identifier for the row | Primary key columns |
unique | No duplicate values allowed | Business keys |
not_null | NULL values not allowed | Required fields |
foreign_key | References another table | Foreign key columns |
check | Custom validation expression | Business rules |
Schema Validation Configuration
Configure schema validation behavior in your project:
# models/marts/schema.yml
version: 2
models:
- name: fct_orders
config:
schema_validation:
enabled: true
on_type_mismatch: error # error, warn, ignore
on_missing_column: error # error, warn, ignore
on_extra_column: warn # error, warn, ignore
strict_types: false # Exact type matching
columns:
- name: order_id
data_type: integer
tests:
- unique
- not_null
Validation Options
| Option | Values | Description |
|---|---|---|
on_type_mismatch | error, warn, ignore | Action when column type differs |
on_missing_column | error, warn, ignore | Action when expected column missing |
on_extra_column | error, warn, ignore | Action when unexpected column found |
strict_types | true, false | Require exact type match vs compatible |
Complete Example
Here is a comprehensive schema definition with types and constraints:
# models/marts/schema.yml
version: 2
models:
- name: fct_orders
description: "Order transactions fact table"
config:
schema_validation:
enabled: true
on_type_mismatch: error
on_missing_column: error
columns:
- name: order_id
description: "Unique order identifier"
data_type: bigint
constraints:
- primary_key
tests:
- unique
- not_null
- name: customer_id
description: "Reference to customer"
data_type: bigint
constraints:
- not_null
- foreign_key:
to: dim_customers
field: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: order_date
description: "Order placement date"
data_type: date
constraints:
- not_null
tests:
- not_null
- name: gross_amount
description: "Total before discounts"
data_type: decimal(12,2)
constraints:
- not_null
- check: "gross_amount >= 0"
tests:
- not_null
- name: discount_amount
description: "Applied discount"
data_type: decimal(12,2)
constraints:
- check: "discount_amount >= 0"
- name: net_amount
description: "Total after discounts"
data_type: decimal(12,2)
constraints:
- not_null
- check: "net_amount >= 0"
tests:
- not_null
- name: status
description: "Current order status"
data_type: varchar(20)
constraints:
- not_null
tests:
- not_null
- accepted_values:
values: ['pending', 'processing', 'completed', 'cancelled', 'refunded']
- name: shipping_address
description: "Delivery address as JSON"
data_type: jsonb
- name: is_gift
description: "Gift order flag"
data_type: boolean
constraints:
- not_null
- name: created_at
description: "Record creation timestamp"
data_type: timestamptz
constraints:
- not_null
tests:
- not_null
- name: updated_at
description: "Last update timestamp"
data_type: timestamptz
- name: dim_customers
description: "Customer dimension table"
columns:
- name: customer_id
description: "Primary key"
data_type: bigint
constraints:
- primary_key
tests:
- unique
- not_null
- name: email
description: "Customer email"
data_type: varchar(255)
constraints:
- unique
- not_null
tests:
- unique
- not_null
- name: first_name
data_type: varchar(100)
- name: last_name
data_type: varchar(100)
- name: registration_date
data_type: date
constraints:
- not_null
- name: lifetime_value
data_type: decimal(12,2)
constraints:
- check: "lifetime_value >= 0"
Running Schema Validation
Execute schema validation using the CLI:
# Validate schemas during compilation
olytix-core compile --validate-schema
# Run schema tests only
olytix-core test --schema-only
# Validate specific models
olytix-core test --schema-only --select fct_orders dim_customers
# Show detailed schema comparison
olytix-core test --schema-only --verbose
Example Output:
Validating schemas...
fct_orders:
✓ order_id: bigint (matches)
✓ customer_id: bigint (matches)
✓ order_date: date (matches)
✗ gross_amount: expected decimal(12,2), got decimal(10,2)
✓ status: varchar(20) (matches)
✓ created_at: timestamptz (matches)
dim_customers:
✓ customer_id: bigint (matches)
✓ email: varchar(255) (matches)
⚠ phone: unexpected column found
Schema validation: 1 error, 1 warning
Best Practices
1. Define Types for All Columns
Always specify data_type for documentation and validation:
columns:
- name: amount
data_type: decimal(10,2)
description: "Transaction amount in USD"
2. Use Appropriate Precision
Match precision to business requirements:
# Currency amounts
- name: price
data_type: decimal(12,2) # 10 digits + 2 decimal places
# Percentages
- name: tax_rate
data_type: decimal(5,4) # e.g., 0.0825 for 8.25%
# Large IDs
- name: event_id
data_type: bigint # Supports large sequences
3. Combine Schema and Data Tests
Use both test types for comprehensive validation:
columns:
- name: customer_id
data_type: bigint # Schema test
constraints:
- not_null
- foreign_key:
to: dim_customers
field: customer_id
tests: # Data tests
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
4. Document Constraints
Explain the business reason for constraints:
columns:
- name: quantity
description: "Order quantity - must be positive"
data_type: integer
constraints:
- check: "quantity > 0"
Next Steps
Now that you understand schema tests:
- CI/CD Integration - Automate testing in pipelines
- Data Tests - Value validation tests