Sources Schema

Sources represent the raw data tables from your data warehouse that serve as the foundation of your analytics project. This reference documents all fields available when defining sources.

File Location

Source definitions are stored in YAML files within the sources/ directory of your project:

my-project/
└── sources/
    ├── raw.yml
    ├── external.yml
    └── staging/
        └── crm.yml

Top-Level Structure

version: 2

sources:
  - name: source_name
    # ... source configuration

Source Definition

Field	Type	Required	Default	Description
`name`	string	Yes	-	Unique identifier for the source
`description`	string	No	`null`	Human-readable description
`database`	string	No	`null`	Database name containing the source tables
`schema`	string	No	`null`	Schema name containing the source tables
`tables`	list	Yes	`[]`	List of table definitions
`meta`	object	No	`{}`	Custom metadata key-value pairs

Example

version: 2

sources:
  - name: raw_ecommerce
    description: Raw e-commerce data from the production database
    database: analytics_prod
    schema: raw
    tables:
      - name: orders
        # ... table configuration

Table Definition

Each table within a source has the following fields:

Field	Type	Required	Default	Description
`name`	string	Yes	-	Name of the table in the database
`description`	string	No	`null`	Human-readable description
`columns`	list	No	`[]`	List of column definitions
`freshness`	object	No	`null`	Freshness configuration for data quality checks
`loaded_at_field`	string	No	`null`	Column name used for freshness checks
`meta`	object	No	`{}`	Custom metadata key-value pairs

Example

tables:
  - name: orders
    description: Raw e-commerce orders from the storefront
    loaded_at_field: created_at
    freshness:
      warn_after:
        count: 12
        period: hour
      error_after:
        count: 24
        period: hour
    columns:
      - name: id
        # ... column configuration

Column Definition

Each column within a table has the following fields:

Field	Type	Required	Default	Description
`name`	string	Yes	-	Name of the column
`description`	string	No	`null`	Human-readable description
`data_type`	string	No	`null`	SQL data type (e.g., `INTEGER`, `VARCHAR`, `TIMESTAMP`)
`tests`	list	No	`[]`	List of tests to run on this column
`meta`	object	No	`{}`	Custom metadata key-value pairs

Example

columns:
  - name: id
    description: Unique order identifier
    data_type: INTEGER
    tests:
      - unique
      - not_null

  - name: customer_id
    description: Reference to the customer who placed the order
    data_type: INTEGER
    tests:
      - not_null
      - relationships:
          to: ref('dim_customers')
          field: id

  - name: total_amount
    description: Total order value in USD
    data_type: DECIMAL(10,2)
    meta:
      sensitivity: pii

Freshness Configuration

Freshness checks help ensure your source data is being updated as expected.

Field	Type	Required	Description
`warn_after`	object	No	Threshold for warning alerts
`error_after`	object	No	Threshold for error alerts

Threshold Object

Field	Type	Required	Description
`count`	integer	Yes	Number of time periods
`period`	string	Yes	Time period: `minute`, `hour`, `day`

Example

freshness:
  warn_after:
    count: 6
    period: hour
  error_after:
    count: 12
    period: hour

Column Tests

Olytix Core supports several built-in tests for data quality validation:

Test	Description
`unique`	Values must be unique across all rows
`not_null`	Values cannot be null
`accepted_values`	Values must be from a predefined list
`relationships`	Values must exist in a referenced table

Example with Tests

columns:
  - name: status
    description: Order status
    tests:
      - not_null
      - accepted_values:
          values: ['pending', 'completed', 'cancelled', 'refunded']

  - name: region_id
    description: Geographic region reference
    tests:
      - relationships:
          to: ref('dim_regions')
          field: id

Complete Example

version: 2

sources:
  - name: raw
    description: Raw data from production systems
    database: analytics
    schema: public
    meta:
      owner: data-engineering
      slack_channel: "#data-alerts"
    tables:
      - name: orders
        description: E-commerce order transactions
        loaded_at_field: created_at
        freshness:
          warn_after:
            count: 12
            period: hour
          error_after:
            count: 24
            period: hour
        columns:
          - name: id
            description: Primary key
            data_type: INTEGER
            tests:
              - unique
              - not_null
          - name: customer_id
            description: Customer reference
            data_type: INTEGER
            tests:
              - not_null
          - name: total_amount
            description: Order total in USD
            data_type: DECIMAL(10,2)
          - name: status
            description: Order status
            data_type: VARCHAR(50)
            tests:
              - accepted_values:
                  values: ['pending', 'completed', 'cancelled']
          - name: created_at
            description: Order creation timestamp
            data_type: TIMESTAMP
            tests:
              - not_null

      - name: customers
        description: Customer master data
        columns:
          - name: id
            description: Primary key
            data_type: INTEGER
            tests:
              - unique
              - not_null
          - name: email
            description: Customer email address
            data_type: VARCHAR(255)
            meta:
              sensitivity: pii

      - name: products
        description: Product catalog
        columns:
          - name: id
            description: Primary key
          - name: name
            description: Product name
          - name: category
            description: Product category
          - name: price
            description: Unit price in USD

Referencing Sources

In models, reference source tables using the source() function:

SELECT *
FROM {{ source('raw', 'orders') }}

This creates a dependency link that appears in the project lineage graph.

Best Practices

Always provide descriptions: Document the purpose and content of each source, table, and column.
Define column types: Specify data_type for documentation and validation purposes.
Configure freshness checks: Set up freshness monitoring for critical source tables.
Use metadata for governance: Leverage the meta field to track ownership, sensitivity, and other governance attributes.
Organize by domain: Group related sources in subdirectories (e.g., sources/crm/, sources/finance/).

File Location​

Top-Level Structure​

Source Definition​

Example​

Table Definition​

Example​

Column Definition​

Example​

Freshness Configuration​

Threshold Object​

Example​

Column Tests​

Example with Tests​

Complete Example​

Referencing Sources​

Best Practices​

File Location

Top-Level Structure

Source Definition

Example

Table Definition

Example

Column Definition

Example

Freshness Configuration

Threshold Object

Example

Column Tests

Example with Tests

Complete Example

Referencing Sources

Best Practices