Sources Schema
Sources represent the raw data tables from your data warehouse that serve as the foundation of your analytics project. This reference documents all fields available when defining sources.
File Location
Source definitions are stored in YAML files within the sources/ directory of your project:
my-project/
└── sources/
├── raw.yml
├── external.yml
└── staging/
└── crm.yml
Top-Level Structure
version: 2
sources:
- name: source_name
# ... source configuration
Source Definition
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | - | Unique identifier for the source |
description | string | No | null | Human-readable description |
database | string | No | null | Database name containing the source tables |
schema | string | No | null | Schema name containing the source tables |
tables | list | Yes | [] | List of table definitions |
meta | object | No | {} | Custom metadata key-value pairs |
Example
version: 2
sources:
- name: raw_ecommerce
description: Raw e-commerce data from the production database
database: analytics_prod
schema: raw
tables:
- name: orders
# ... table configuration
Table Definition
Each table within a source has the following fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | - | Name of the table in the database |
description | string | No | null | Human-readable description |
columns | list | No | [] | List of column definitions |
freshness | object | No | null | Freshness configuration for data quality checks |
loaded_at_field | string | No | null | Column name used for freshness checks |
meta | object | No | {} | Custom metadata key-value pairs |
Example
tables:
- name: orders
description: Raw e-commerce orders from the storefront
loaded_at_field: created_at
freshness:
warn_after:
count: 12
period: hour
error_after:
count: 24
period: hour
columns:
- name: id
# ... column configuration
Column Definition
Each column within a table has the following fields:
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | - | Name of the column |
description | string | No | null | Human-readable description |
data_type | string | No | null | SQL data type (e.g., INTEGER, VARCHAR, TIMESTAMP) |
tests | list | No | [] | List of tests to run on this column |
meta | object | No | {} | Custom metadata key-value pairs |
Example
columns:
- name: id
description: Unique order identifier
data_type: INTEGER
tests:
- unique
- not_null
- name: customer_id
description: Reference to the customer who placed the order
data_type: INTEGER
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: id
- name: total_amount
description: Total order value in USD
data_type: DECIMAL(10,2)
meta:
sensitivity: pii
Freshness Configuration
Freshness checks help ensure your source data is being updated as expected.
| Field | Type | Required | Description |
|---|---|---|---|
warn_after | object | No | Threshold for warning alerts |
error_after | object | No | Threshold for error alerts |
Threshold Object
| Field | Type | Required | Description |
|---|---|---|---|
count | integer | Yes | Number of time periods |
period | string | Yes | Time period: minute, hour, day |
Example
freshness:
warn_after:
count: 6
period: hour
error_after:
count: 12
period: hour
Column Tests
Olytix Core supports several built-in tests for data quality validation:
| Test | Description |
|---|---|
unique | Values must be unique across all rows |
not_null | Values cannot be null |
accepted_values | Values must be from a predefined list |
relationships | Values must exist in a referenced table |
Example with Tests
columns:
- name: status
description: Order status
tests:
- not_null
- accepted_values:
values: ['pending', 'completed', 'cancelled', 'refunded']
- name: region_id
description: Geographic region reference
tests:
- relationships:
to: ref('dim_regions')
field: id
Complete Example
version: 2
sources:
- name: raw
description: Raw data from production systems
database: analytics
schema: public
meta:
owner: data-engineering
slack_channel: "#data-alerts"
tables:
- name: orders
description: E-commerce order transactions
loaded_at_field: created_at
freshness:
warn_after:
count: 12
period: hour
error_after:
count: 24
period: hour
columns:
- name: id
description: Primary key
data_type: INTEGER
tests:
- unique
- not_null
- name: customer_id
description: Customer reference
data_type: INTEGER
tests:
- not_null
- name: total_amount
description: Order total in USD
data_type: DECIMAL(10,2)
- name: status
description: Order status
data_type: VARCHAR(50)
tests:
- accepted_values:
values: ['pending', 'completed', 'cancelled']
- name: created_at
description: Order creation timestamp
data_type: TIMESTAMP
tests:
- not_null
- name: customers
description: Customer master data
columns:
- name: id
description: Primary key
data_type: INTEGER
tests:
- unique
- not_null
- name: email
description: Customer email address
data_type: VARCHAR(255)
meta:
sensitivity: pii
- name: products
description: Product catalog
columns:
- name: id
description: Primary key
- name: name
description: Product name
- name: category
description: Product category
- name: price
description: Unit price in USD
Referencing Sources
In models, reference source tables using the source() function:
SELECT *
FROM {{ source('raw', 'orders') }}
This creates a dependency link that appears in the project lineage graph.
Best Practices
-
Always provide descriptions: Document the purpose and content of each source, table, and column.
-
Define column types: Specify
data_typefor documentation and validation purposes. -
Configure freshness checks: Set up freshness monitoring for critical source tables.
-
Use metadata for governance: Leverage the
metafield to track ownership, sensitivity, and other governance attributes. -
Organize by domain: Group related sources in subdirectories (e.g.,
sources/crm/,sources/finance/).