Skip to main content

Sources Schema

Sources represent the raw data tables from your data warehouse that serve as the foundation of your analytics project. This reference documents all fields available when defining sources.

File Location

Source definitions are stored in YAML files within the sources/ directory of your project:

my-project/
└── sources/
├── raw.yml
├── external.yml
└── staging/
└── crm.yml

Top-Level Structure

version: 2

sources:
- name: source_name
# ... source configuration

Source Definition

FieldTypeRequiredDefaultDescription
namestringYes-Unique identifier for the source
descriptionstringNonullHuman-readable description
databasestringNonullDatabase name containing the source tables
schemastringNonullSchema name containing the source tables
tableslistYes[]List of table definitions
metaobjectNo{}Custom metadata key-value pairs

Example

version: 2

sources:
- name: raw_ecommerce
description: Raw e-commerce data from the production database
database: analytics_prod
schema: raw
tables:
- name: orders
# ... table configuration

Table Definition

Each table within a source has the following fields:

FieldTypeRequiredDefaultDescription
namestringYes-Name of the table in the database
descriptionstringNonullHuman-readable description
columnslistNo[]List of column definitions
freshnessobjectNonullFreshness configuration for data quality checks
loaded_at_fieldstringNonullColumn name used for freshness checks
metaobjectNo{}Custom metadata key-value pairs

Example

tables:
- name: orders
description: Raw e-commerce orders from the storefront
loaded_at_field: created_at
freshness:
warn_after:
count: 12
period: hour
error_after:
count: 24
period: hour
columns:
- name: id
# ... column configuration

Column Definition

Each column within a table has the following fields:

FieldTypeRequiredDefaultDescription
namestringYes-Name of the column
descriptionstringNonullHuman-readable description
data_typestringNonullSQL data type (e.g., INTEGER, VARCHAR, TIMESTAMP)
testslistNo[]List of tests to run on this column
metaobjectNo{}Custom metadata key-value pairs

Example

columns:
- name: id
description: Unique order identifier
data_type: INTEGER
tests:
- unique
- not_null

- name: customer_id
description: Reference to the customer who placed the order
data_type: INTEGER
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: id

- name: total_amount
description: Total order value in USD
data_type: DECIMAL(10,2)
meta:
sensitivity: pii

Freshness Configuration

Freshness checks help ensure your source data is being updated as expected.

FieldTypeRequiredDescription
warn_afterobjectNoThreshold for warning alerts
error_afterobjectNoThreshold for error alerts

Threshold Object

FieldTypeRequiredDescription
countintegerYesNumber of time periods
periodstringYesTime period: minute, hour, day

Example

freshness:
warn_after:
count: 6
period: hour
error_after:
count: 12
period: hour

Column Tests

Olytix Core supports several built-in tests for data quality validation:

TestDescription
uniqueValues must be unique across all rows
not_nullValues cannot be null
accepted_valuesValues must be from a predefined list
relationshipsValues must exist in a referenced table

Example with Tests

columns:
- name: status
description: Order status
tests:
- not_null
- accepted_values:
values: ['pending', 'completed', 'cancelled', 'refunded']

- name: region_id
description: Geographic region reference
tests:
- relationships:
to: ref('dim_regions')
field: id

Complete Example

version: 2

sources:
- name: raw
description: Raw data from production systems
database: analytics
schema: public
meta:
owner: data-engineering
slack_channel: "#data-alerts"
tables:
- name: orders
description: E-commerce order transactions
loaded_at_field: created_at
freshness:
warn_after:
count: 12
period: hour
error_after:
count: 24
period: hour
columns:
- name: id
description: Primary key
data_type: INTEGER
tests:
- unique
- not_null
- name: customer_id
description: Customer reference
data_type: INTEGER
tests:
- not_null
- name: total_amount
description: Order total in USD
data_type: DECIMAL(10,2)
- name: status
description: Order status
data_type: VARCHAR(50)
tests:
- accepted_values:
values: ['pending', 'completed', 'cancelled']
- name: created_at
description: Order creation timestamp
data_type: TIMESTAMP
tests:
- not_null

- name: customers
description: Customer master data
columns:
- name: id
description: Primary key
data_type: INTEGER
tests:
- unique
- not_null
- name: email
description: Customer email address
data_type: VARCHAR(255)
meta:
sensitivity: pii

- name: products
description: Product catalog
columns:
- name: id
description: Primary key
- name: name
description: Product name
- name: category
description: Product category
- name: price
description: Unit price in USD

Referencing Sources

In models, reference source tables using the source() function:

SELECT *
FROM {{ source('raw', 'orders') }}

This creates a dependency link that appears in the project lineage graph.

Best Practices

  1. Always provide descriptions: Document the purpose and content of each source, table, and column.

  2. Define column types: Specify data_type for documentation and validation purposes.

  3. Configure freshness checks: Set up freshness monitoring for critical source tables.

  4. Use metadata for governance: Leverage the meta field to track ownership, sensitivity, and other governance attributes.

  5. Organize by domain: Group related sources in subdirectories (e.g., sources/crm/, sources/finance/).