Your First Cube
Cubes are the heart of Olytix Core's semantic layer. They define how your data can be queried, what metrics are available, and how dimensions slice the data. This guide walks you through creating your first cube.
What is a Cube?
A cube is a semantic abstraction over your data that defines:
- Measures: Numeric aggregations (counts, sums, averages)
- Dimensions: Attributes to group and filter by
- Joins: Relationships to other cubes
- Pre-aggregations: Cached summaries for performance
┌─────────────────────────────────────────────────┐
│ Cube: Orders │
├─────────────────────────────────────────────────┤
│ Measures: │
│ • count (COUNT) │
│ • total_revenue (SUM of amount) │
│ • avg_order_value (AVG of amount) │
├─────────────────────────────────────────────────┤
│ Dimensions: │
│ • order_id (primary key) │
│ • order_date (time) │
│ • status (string) │
│ • customer_id (number) │
└─────────────────────────────────────────────────┘
Prerequisites
Before creating a cube, you need:
- A Olytix Core project initialized (
olytix-core init) - A data source configured
- A model that transforms your raw data
If you don't have these yet, complete the Quick Start first.
Step 1: Understand Your Data
Let's say you have an orders table with this structure:
-- Your transformed model: fct_orders
SELECT
order_id,
customer_id,
product_id,
order_date,
quantity,
unit_price,
discount_amount,
total_amount,
status,
region
FROM transformed_orders
Step 2: Create the Cube File
Create cubes/orders.yml:
cubes:
- name: orders
# The SQL that defines the cube's base data
sql: "SELECT * FROM {{ ref('fct_orders') }}"
# Human-readable description
description: "Order transactions for e-commerce analytics"
# Metadata for governance
meta:
owner: analytics-team
certified: true
refresh_frequency: hourly
Step 3: Define Measures
Measures are the numeric values you want to aggregate. Add them to your cube:
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
description: "Order transactions for e-commerce analytics"
measures:
# Simple count of all orders
- name: count
type: count
description: "Total number of orders"
# Sum of order amounts
- name: total_revenue
type: sum
sql: total_amount
format: currency
description: "Total revenue from all orders"
# Average order value
- name: avg_order_value
type: avg
sql: total_amount
format: currency
description: "Average order amount"
# Count of distinct customers
- name: unique_customers
type: count_distinct
sql: customer_id
description: "Number of unique customers"
# Total units sold
- name: total_quantity
type: sum
sql: quantity
description: "Total units sold"
# Calculated measure (gross margin)
- name: gross_revenue
type: sum
sql: "total_amount - discount_amount"
format: currency
description: "Revenue after discounts"
Measure Types
| Type | Description | Example |
|---|---|---|
count | Count of rows | Total orders |
count_distinct | Count of unique values | Unique customers |
sum | Sum of values | Total revenue |
avg | Average of values | Average order value |
min | Minimum value | First order date |
max | Maximum value | Largest order |
number | Custom SQL expression | Calculated field |
Step 4: Define Dimensions
Dimensions are the attributes you use to slice and filter data:
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
description: "Order transactions for e-commerce analytics"
measures:
# ... measures from above
dimensions:
# Primary key (required)
- name: order_id
type: number
sql: order_id
primary_key: true
description: "Unique order identifier"
# Time dimension with granularities
- name: order_date
type: time
sql: order_date
granularities:
- day
- week
- month
- quarter
- year
description: "Date the order was placed"
# String dimension
- name: status
type: string
sql: status
description: "Order status (pending, completed, cancelled)"
# String dimension with case transformation
- name: region
type: string
sql: "UPPER(region)"
description: "Geographic region"
# Number dimension
- name: customer_id
type: number
sql: customer_id
description: "Customer identifier"
# Calculated dimension
- name: order_size
type: string
sql: |
CASE
WHEN total_amount < 50 THEN 'Small'
WHEN total_amount < 200 THEN 'Medium'
ELSE 'Large'
END
description: "Order size category"
# Boolean dimension
- name: has_discount
type: boolean
sql: "discount_amount > 0"
description: "Whether the order has a discount"
Dimension Types
| Type | Description | Example |
|---|---|---|
string | Text values | Status, region, category |
number | Numeric values | Customer ID, quantity |
time | Date/timestamp | Order date, created_at |
boolean | True/false | Is active, has discount |
geo | Geographic data | Location, coordinates |
Step 5: Compile and Validate
Compile your project to validate the cube definition:
olytix-core compile
Expected output:
Compiling project...
✓ Loaded 1 source with 1 table
✓ Compiled 1 model
✓ Registered 1 cube with 6 measures and 7 dimensions
✓ Built column-level lineage
✓ Generated manifest at target/manifest.json
Compilation completed in 0.38s
Validation Errors
If there are issues, you'll see helpful error messages:
Error in cubes/orders.yml:
Cube 'orders' has invalid measure 'total_revenue':
- SQL column 'total_amountt' not found in base query
Did you mean 'total_amount'?
Step 6: Test Your Cube
Start the API server and test your cube:
olytix-core serve
Query using curl:
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"measures": ["orders.total_revenue", "orders.count"],
"dimensions": ["orders.status"]
}'
Response:
{
"data": [
{ "orders.status": "completed", "orders.total_revenue": 125000.00, "orders.count": 1523 },
{ "orders.status": "pending", "orders.total_revenue": 15000.00, "orders.count": 187 },
{ "orders.status": "cancelled", "orders.total_revenue": 0, "orders.count": 45 }
],
"query": {
"sql": "SELECT status, SUM(total_amount) AS total_revenue, COUNT(*) AS count FROM fct_orders GROUP BY status",
"duration_ms": 23
}
}
Complete Cube Example
Here's the full cubes/orders.yml:
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
description: "Order transactions for e-commerce analytics"
meta:
owner: analytics-team
certified: true
refresh_frequency: hourly
measures:
- name: count
type: count
description: "Total number of orders"
- name: total_revenue
type: sum
sql: total_amount
format: currency
description: "Total revenue from all orders"
- name: avg_order_value
type: avg
sql: total_amount
format: currency
description: "Average order amount"
- name: unique_customers
type: count_distinct
sql: customer_id
description: "Number of unique customers"
- name: total_quantity
type: sum
sql: quantity
description: "Total units sold"
- name: gross_revenue
type: sum
sql: "total_amount - discount_amount"
format: currency
description: "Revenue after discounts"
dimensions:
- name: order_id
type: number
sql: order_id
primary_key: true
description: "Unique order identifier"
- name: order_date
type: time
sql: order_date
granularities:
- day
- week
- month
- quarter
- year
description: "Date the order was placed"
- name: status
type: string
sql: status
description: "Order status"
- name: region
type: string
sql: "UPPER(region)"
description: "Geographic region"
- name: customer_id
type: number
sql: customer_id
description: "Customer identifier"
- name: order_size
type: string
sql: |
CASE
WHEN total_amount < 50 THEN 'Small'
WHEN total_amount < 200 THEN 'Medium'
ELSE 'Large'
END
description: "Order size category"
- name: has_discount
type: boolean
sql: "discount_amount > 0"
description: "Whether the order has a discount"
Best Practices
Naming Conventions
- Use
snake_casefor all names - Use descriptive, business-friendly names
- Prefix measures with action:
total_,avg_,count_
Documentation
- Always include descriptions
- Add
metafor governance information - Document expected values for string dimensions
Performance
- Mark primary keys for efficient joins
- Use appropriate granularities for time dimensions
- Consider pre-aggregations for large datasets
Next Steps
Now that you've created your first cube: