Skip to main content

Cube Fundamentals

For Data Analysts

Cubes are the central building blocks of Olytix Core's semantic layer. They provide a semantic abstraction over your data that defines what can be queried, how metrics are calculated, and how dimensions slice the data.

What is a Cube?

A cube defines:

  • Measures: Numeric aggregations (counts, sums, averages)
  • Dimensions: Attributes for grouping and filtering
  • Joins: Relationships to other cubes
  • Segments: Reusable filter conditions
  • Pre-aggregations: Cached summaries for performance

Cubes sit between your transformed data models and the queries executed by end users, providing a consistent semantic interface.

┌─────────────────────────────────────────────────────────────┐
│ Data Flow with Cubes │
├─────────────────────────────────────────────────────────────┤
│ │
│ Models (SQL) Cubes (Semantic) Queries │
│ ┌──────────┐ ┌──────────────┐ ┌──────────┐ │
│ │fct_orders│ ──────► │ orders cube │ ◄──── │ API/SQL │ │
│ └──────────┘ │ - measures │ └──────────┘ │
│ │ - dimensions│ │
│ ┌──────────┐ │ - joins │ │
│ │dim_custs │ ──────► └──────────────┘ │
│ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘

Cube File Structure

Cubes are defined in YAML files within the cubes/ directory. Each file can contain one or more cube definitions.

my_project/
├── cubes/
│ ├── orders.yml
│ ├── customers.yml
│ └── products.yml

Basic Cube Definition

A minimal cube requires a name and a data source (either SQL or a table reference):

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
description: "Order transactions for e-commerce analytics"

Complete Cube Structure

Here is a comprehensive cube definition showing all available options:

cubes:
- name: orders
# Data source - choose one of: sql, sql_table, or model
sql: "SELECT * FROM {{ ref('fct_orders') }}"
# sql_table: "schema.orders" # Direct table reference
# model: fct_orders # Reference to a Olytix Core model

# Metadata
title: "Orders"
description: |
Order transactions for e-commerce analytics.
Contains all completed and pending orders.
public: true

# Governance metadata
meta:
owner: analytics-team
certified: true
refresh_frequency: hourly
data_classification: internal

# Optional: data source override
data_source: default

# Optional: refresh key for caching
refresh_key:
sql: "SELECT MAX(updated_at) FROM orders"

# Measures - numeric aggregations
measures:
- name: count
type: count
description: "Total number of orders"

- name: total_revenue
type: sum
sql: total_amount
format: currency
description: "Total revenue from all orders"

- name: avg_order_value
type: avg
sql: total_amount
format: currency
description: "Average order amount"

# Dimensions - grouping attributes
dimensions:
- name: order_id
type: number
sql: order_id
primary_key: true
description: "Unique order identifier"

- name: order_date
type: time
sql: order_date
granularities:
- day
- week
- month
- quarter
- year
description: "Date the order was placed"

- name: status
type: string
sql: status
description: "Order status"

# Joins - relationships to other cubes
joins:
- name: customers
sql: "{orders}.customer_id = {customers}.customer_id"
relationship: belongs_to

# Segments - reusable filters
segments:
- name: completed
sql: "status = 'completed'"
description: "Only completed orders"

# Pre-aggregations - performance optimization
pre_aggregations:
- name: orders_by_day
measures:
- count
- total_revenue
dimensions:
- status
time_dimension: order_date
granularity: day
refresh:
every: "1 hour"

Data Source Options

Olytix Core provides three ways to specify the underlying data for a cube:

Using SQL with ref()

Reference a Olytix Core model using the ref() function:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

Using SQL with source()

Reference a raw source table:

cubes:
- name: raw_orders
sql: "SELECT * FROM {{ source('raw', 'orders') }}"

Using sql_table

Reference a table directly by name:

cubes:
- name: orders
sql_table: "analytics.fct_orders"

Using model

Reference a Olytix Core model by name:

cubes:
- name: orders
model: fct_orders

Cube Visibility

Control whether a cube is exposed via the API:

cubes:
- name: internal_orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
public: false # Hidden from API discovery

Private cubes can still be referenced by other cubes via joins but are not directly queryable.

Refresh Keys

Define how Olytix Core determines if cached data needs refreshing:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
refresh_key:
sql: "SELECT MAX(updated_at) FROM fct_orders"

The refresh key SQL should return a single value that changes when the underlying data changes.

Extending Cubes

Create a new cube by extending an existing one:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
measures:
- name: count
type: count

- name: completed_orders
extends: orders
description: "Only completed orders"
sql: "SELECT * FROM {{ ref('fct_orders') }} WHERE status = 'completed'"

The extended cube inherits all measures, dimensions, and other properties from the parent cube.

Metadata for Governance

Use the meta field to store governance information:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
meta:
owner: analytics-team
certified: true
certification_date: "2024-01-15"
refresh_frequency: hourly
data_classification: confidential
pii_fields:
- customer_email
- billing_address
tags:
- finance
- revenue

This metadata is accessible via the API and can be used for documentation, discovery, and access control.

Best Practices

Naming Conventions

ElementConventionExample
Cube nameSingular nounorders, customer, product
File nameMatch cube nameorders.yml, customer.yml
Measure nameAction prefixtotal_revenue, avg_order_value, count_customers
Dimension nameDescriptiveorder_date, customer_region, product_category

One Cube Per Fact Table

Create one cube per logical entity or fact table:

# cubes/orders.yml
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

# cubes/customers.yml
cubes:
- name: customers
sql: "SELECT * FROM {{ ref('dim_customers') }}"

Always Define Primary Keys

Primary keys enable efficient joins and are required for proper lineage:

dimensions:
- name: order_id
type: number
sql: order_id
primary_key: true

Document Everything

Include descriptions for the cube and all its members:

cubes:
- name: orders
description: |
## Overview
Order transactions for e-commerce analytics.

## Data Source
Aggregated from raw order events.

## Refresh Schedule
Updated hourly from the data warehouse.
measures:
- name: total_revenue
description: "Sum of all order amounts before discounts"

Validation and Compilation

Validate your cube definitions by compiling the project:

olytix-core compile

Expected output:

Compiling project...
✓ Loaded 2 sources with 5 tables
✓ Compiled 8 models
✓ Registered 4 cubes with 24 measures and 32 dimensions
✓ Built column-level lineage
✓ Generated manifest at target/manifest.json

Compilation completed in 0.52s

Next Steps