Cube Fundamentals
Cubes are the central building blocks of Olytix Core's semantic layer. They provide a semantic abstraction over your data that defines what can be queried, how metrics are calculated, and how dimensions slice the data.
What is a Cube?
A cube defines:
- Measures: Numeric aggregations (counts, sums, averages)
- Dimensions: Attributes for grouping and filtering
- Joins: Relationships to other cubes
- Segments: Reusable filter conditions
- Pre-aggregations: Cached summaries for performance
Cubes sit between your transformed data models and the queries executed by end users, providing a consistent semantic interface.
┌─────────────────────────────────────────────────────────────┐
│ Data Flow with Cubes │
├─────────────────────────────────────────────────────────────┤
│ │
│ Models (SQL) Cubes (Semantic) Queries │
│ ┌──────────┐ ┌──────────────┐ ┌──────────┐ │
│ │fct_orders│ ──────► │ orders cube │ ◄──── │ API/SQL │ │
│ └──────────┘ │ - measures │ └──────────┘ │
│ │ - dimensions│ │
│ ┌──────────┐ │ - joins │ │
│ │dim_custs │ ──────► └──────────────┘ │
│ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Cube File Structure
Cubes are defined in YAML files within the cubes/ directory. Each file can contain one or more cube definitions.
my_project/
├── cubes/
│ ├── orders.yml
│ ├── customers.yml
│ └── products.yml
Basic Cube Definition
A minimal cube requires a name and a data source (either SQL or a table reference):
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
description: "Order transactions for e-commerce analytics"
Complete Cube Structure
Here is a comprehensive cube definition showing all available options:
cubes:
- name: orders
# Data source - choose one of: sql, sql_table, or model
sql: "SELECT * FROM {{ ref('fct_orders') }}"
# sql_table: "schema.orders" # Direct table reference
# model: fct_orders # Reference to a Olytix Core model
# Metadata
title: "Orders"
description: |
Order transactions for e-commerce analytics.
Contains all completed and pending orders.
public: true
# Governance metadata
meta:
owner: analytics-team
certified: true
refresh_frequency: hourly
data_classification: internal
# Optional: data source override
data_source: default
# Optional: refresh key for caching
refresh_key:
sql: "SELECT MAX(updated_at) FROM orders"
# Measures - numeric aggregations
measures:
- name: count
type: count
description: "Total number of orders"
- name: total_revenue
type: sum
sql: total_amount
format: currency
description: "Total revenue from all orders"
- name: avg_order_value
type: avg
sql: total_amount
format: currency
description: "Average order amount"
# Dimensions - grouping attributes
dimensions:
- name: order_id
type: number
sql: order_id
primary_key: true
description: "Unique order identifier"
- name: order_date
type: time
sql: order_date
granularities:
- day
- week
- month
- quarter
- year
description: "Date the order was placed"
- name: status
type: string
sql: status
description: "Order status"
# Joins - relationships to other cubes
joins:
- name: customers
sql: "{orders}.customer_id = {customers}.customer_id"
relationship: belongs_to
# Segments - reusable filters
segments:
- name: completed
sql: "status = 'completed'"
description: "Only completed orders"
# Pre-aggregations - performance optimization
pre_aggregations:
- name: orders_by_day
measures:
- count
- total_revenue
dimensions:
- status
time_dimension: order_date
granularity: day
refresh:
every: "1 hour"
Data Source Options
Olytix Core provides three ways to specify the underlying data for a cube:
Using SQL with ref()
Reference a Olytix Core model using the ref() function:
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
Using SQL with source()
Reference a raw source table:
cubes:
- name: raw_orders
sql: "SELECT * FROM {{ source('raw', 'orders') }}"
Using sql_table
Reference a table directly by name:
cubes:
- name: orders
sql_table: "analytics.fct_orders"
Using model
Reference a Olytix Core model by name:
cubes:
- name: orders
model: fct_orders
Cube Visibility
Control whether a cube is exposed via the API:
cubes:
- name: internal_orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
public: false # Hidden from API discovery
Private cubes can still be referenced by other cubes via joins but are not directly queryable.
Refresh Keys
Define how Olytix Core determines if cached data needs refreshing:
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
refresh_key:
sql: "SELECT MAX(updated_at) FROM fct_orders"
The refresh key SQL should return a single value that changes when the underlying data changes.
Extending Cubes
Create a new cube by extending an existing one:
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
measures:
- name: count
type: count
- name: completed_orders
extends: orders
description: "Only completed orders"
sql: "SELECT * FROM {{ ref('fct_orders') }} WHERE status = 'completed'"
The extended cube inherits all measures, dimensions, and other properties from the parent cube.
Metadata for Governance
Use the meta field to store governance information:
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
meta:
owner: analytics-team
certified: true
certification_date: "2024-01-15"
refresh_frequency: hourly
data_classification: confidential
pii_fields:
- customer_email
- billing_address
tags:
- finance
- revenue
This metadata is accessible via the API and can be used for documentation, discovery, and access control.
Best Practices
Naming Conventions
| Element | Convention | Example |
|---|---|---|
| Cube name | Singular noun | orders, customer, product |
| File name | Match cube name | orders.yml, customer.yml |
| Measure name | Action prefix | total_revenue, avg_order_value, count_customers |
| Dimension name | Descriptive | order_date, customer_region, product_category |
One Cube Per Fact Table
Create one cube per logical entity or fact table:
# cubes/orders.yml
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
# cubes/customers.yml
cubes:
- name: customers
sql: "SELECT * FROM {{ ref('dim_customers') }}"
Always Define Primary Keys
Primary keys enable efficient joins and are required for proper lineage:
dimensions:
- name: order_id
type: number
sql: order_id
primary_key: true
Document Everything
Include descriptions for the cube and all its members:
cubes:
- name: orders
description: |
## Overview
Order transactions for e-commerce analytics.
## Data Source
Aggregated from raw order events.
## Refresh Schedule
Updated hourly from the data warehouse.
measures:
- name: total_revenue
description: "Sum of all order amounts before discounts"
Validation and Compilation
Validate your cube definitions by compiling the project:
olytix-core compile
Expected output:
Compiling project...
✓ Loaded 2 sources with 5 tables
✓ Compiled 8 models
✓ Registered 4 cubes with 24 measures and 32 dimensions
✓ Built column-level lineage
✓ Generated manifest at target/manifest.json
Compilation completed in 0.52s
Next Steps
- Measures - Define numeric aggregations
- Dimensions - Create grouping attributes
- Joins - Connect cubes together
- Pre-aggregations - Optimize query performance