Skip to main content

Schema Tests

For Data Analysts

Schema tests validate the structure and types of your data, ensuring columns match expected definitions and constraints are enforced.

Overview

While data tests validate values, schema tests validate structure:

Test TypePurpose
Data TestsValue correctness (unique, not_null, accepted_values)
Schema TestsStructure correctness (types, constraints, column existence)
┌─────────────────────────────────────────────────────────────────────┐
│ Schema Validation │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Expected Schema Actual Schema │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ order_id: INTEGER│ ─► │ order_id: INTEGER│ ✓ Match │
│ │ amount: DECIMAL │ ─► │ amount: VARCHAR │ ✗ Mismatch │
│ │ created: DATE │ ─► │ created: DATE │ ✓ Match │
│ └──────────────────┘ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘

Column Type Validation

Define expected column types in your schema YAML:

# models/marts/schema.yml
version: 2

models:
- name: fct_orders
description: "Order fact table with type validation"

columns:
- name: order_id
description: "Primary key"
data_type: integer
tests:
- unique
- not_null

- name: customer_id
description: "Foreign key to customers"
data_type: integer
tests:
- not_null

- name: order_date
description: "Date order was placed"
data_type: date
tests:
- not_null

- name: net_amount
description: "Order total after discounts"
data_type: decimal(10,2)
tests:
- not_null

- name: status
description: "Order status"
data_type: varchar(50)
tests:
- accepted_values:
values: ['completed', 'pending', 'cancelled']

- name: is_active
description: "Whether order is active"
data_type: boolean

- name: metadata
description: "Additional order metadata"
data_type: jsonb

Supported Data Types

Olytix Core supports standard SQL data types across different warehouses:

Numeric Types

TypeDescriptionExample
integerWhole numbersorder_count
bigintLarge integersevent_id
smallintSmall integersquantity
decimal(p,s)Exact numericamount
numeric(p,s)Same as decimalprice
floatFloating pointpercentage
doubleDouble precisionscore

String Types

TypeDescriptionExample
varchar(n)Variable-length stringname
char(n)Fixed-length stringcountry_code
textUnlimited textdescription

Date/Time Types

TypeDescriptionExample
dateDate onlyorder_date
timeTime onlyevent_time
timestampDate and timecreated_at
timestamptzTimestamp with timezoneevent_at
intervalTime intervalduration

Other Types

TypeDescriptionExample
booleanTrue/falseis_active
uuidUUID valuesuser_uuid
jsonJSON dataproperties
jsonbBinary JSONmetadata
arrayArray valuestags

Column Constraints

Define constraints to enforce data integrity rules:

models:
- name: dim_products
columns:
- name: product_id
data_type: integer
constraints:
- primary_key
tests:
- unique
- not_null

- name: sku
data_type: varchar(50)
constraints:
- unique
- not_null

- name: price
data_type: decimal(10,2)
constraints:
- not_null
- check: "price >= 0"

- name: category_id
data_type: integer
constraints:
- foreign_key:
to: dim_categories
field: category_id

- name: quantity
data_type: integer
constraints:
- check: "quantity >= 0"

Constraint Types

ConstraintDescriptionUsage
primary_keyUnique identifier for the rowPrimary key columns
uniqueNo duplicate values allowedBusiness keys
not_nullNULL values not allowedRequired fields
foreign_keyReferences another tableForeign key columns
checkCustom validation expressionBusiness rules

Schema Validation Configuration

Configure schema validation behavior in your project:

# models/marts/schema.yml
version: 2

models:
- name: fct_orders
config:
schema_validation:
enabled: true
on_type_mismatch: error # error, warn, ignore
on_missing_column: error # error, warn, ignore
on_extra_column: warn # error, warn, ignore
strict_types: false # Exact type matching

columns:
- name: order_id
data_type: integer
tests:
- unique
- not_null

Validation Options

OptionValuesDescription
on_type_mismatcherror, warn, ignoreAction when column type differs
on_missing_columnerror, warn, ignoreAction when expected column missing
on_extra_columnerror, warn, ignoreAction when unexpected column found
strict_typestrue, falseRequire exact type match vs compatible

Complete Example

Here is a comprehensive schema definition with types and constraints:

# models/marts/schema.yml
version: 2

models:
- name: fct_orders
description: "Order transactions fact table"

config:
schema_validation:
enabled: true
on_type_mismatch: error
on_missing_column: error

columns:
- name: order_id
description: "Unique order identifier"
data_type: bigint
constraints:
- primary_key
tests:
- unique
- not_null

- name: customer_id
description: "Reference to customer"
data_type: bigint
constraints:
- not_null
- foreign_key:
to: dim_customers
field: customer_id
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id

- name: order_date
description: "Order placement date"
data_type: date
constraints:
- not_null
tests:
- not_null

- name: gross_amount
description: "Total before discounts"
data_type: decimal(12,2)
constraints:
- not_null
- check: "gross_amount >= 0"
tests:
- not_null

- name: discount_amount
description: "Applied discount"
data_type: decimal(12,2)
constraints:
- check: "discount_amount >= 0"

- name: net_amount
description: "Total after discounts"
data_type: decimal(12,2)
constraints:
- not_null
- check: "net_amount >= 0"
tests:
- not_null

- name: status
description: "Current order status"
data_type: varchar(20)
constraints:
- not_null
tests:
- not_null
- accepted_values:
values: ['pending', 'processing', 'completed', 'cancelled', 'refunded']

- name: shipping_address
description: "Delivery address as JSON"
data_type: jsonb

- name: is_gift
description: "Gift order flag"
data_type: boolean
constraints:
- not_null

- name: created_at
description: "Record creation timestamp"
data_type: timestamptz
constraints:
- not_null
tests:
- not_null

- name: updated_at
description: "Last update timestamp"
data_type: timestamptz


- name: dim_customers
description: "Customer dimension table"

columns:
- name: customer_id
description: "Primary key"
data_type: bigint
constraints:
- primary_key
tests:
- unique
- not_null

- name: email
description: "Customer email"
data_type: varchar(255)
constraints:
- unique
- not_null
tests:
- unique
- not_null

- name: first_name
data_type: varchar(100)

- name: last_name
data_type: varchar(100)

- name: registration_date
data_type: date
constraints:
- not_null

- name: lifetime_value
data_type: decimal(12,2)
constraints:
- check: "lifetime_value >= 0"

Running Schema Validation

Execute schema validation using the CLI:

# Validate schemas during compilation
olytix-core compile --validate-schema

# Run schema tests only
olytix-core test --schema-only

# Validate specific models
olytix-core test --schema-only --select fct_orders dim_customers

# Show detailed schema comparison
olytix-core test --schema-only --verbose

Example Output:

Validating schemas...

fct_orders:
✓ order_id: bigint (matches)
✓ customer_id: bigint (matches)
✓ order_date: date (matches)
✗ gross_amount: expected decimal(12,2), got decimal(10,2)
✓ status: varchar(20) (matches)
✓ created_at: timestamptz (matches)

dim_customers:
✓ customer_id: bigint (matches)
✓ email: varchar(255) (matches)
⚠ phone: unexpected column found

Schema validation: 1 error, 1 warning

Best Practices

1. Define Types for All Columns

Always specify data_type for documentation and validation:

columns:
- name: amount
data_type: decimal(10,2)
description: "Transaction amount in USD"

2. Use Appropriate Precision

Match precision to business requirements:

# Currency amounts
- name: price
data_type: decimal(12,2) # 10 digits + 2 decimal places

# Percentages
- name: tax_rate
data_type: decimal(5,4) # e.g., 0.0825 for 8.25%

# Large IDs
- name: event_id
data_type: bigint # Supports large sequences

3. Combine Schema and Data Tests

Use both test types for comprehensive validation:

columns:
- name: customer_id
data_type: bigint # Schema test
constraints:
- not_null
- foreign_key:
to: dim_customers
field: customer_id
tests: # Data tests
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id

4. Document Constraints

Explain the business reason for constraints:

columns:
- name: quantity
description: "Order quantity - must be positive"
data_type: integer
constraints:
- check: "quantity > 0"

Next Steps

Now that you understand schema tests:

  1. CI/CD Integration - Automate testing in pipelines
  2. Data Tests - Value validation tests