Skip to main content

Your First Cube

For Data Analysts

Cubes are the heart of Olytix Core's semantic layer. They define how your data can be queried, what metrics are available, and how dimensions slice the data. This guide walks you through creating your first cube.

What is a Cube?

A cube is a semantic abstraction over your data that defines:

  • Measures: Numeric aggregations (counts, sums, averages)
  • Dimensions: Attributes to group and filter by
  • Joins: Relationships to other cubes
  • Pre-aggregations: Cached summaries for performance
┌─────────────────────────────────────────────────┐
│ Cube: Orders │
├─────────────────────────────────────────────────┤
│ Measures: │
│ • count (COUNT) │
│ • total_revenue (SUM of amount) │
│ • avg_order_value (AVG of amount) │
├─────────────────────────────────────────────────┤
│ Dimensions: │
│ • order_id (primary key) │
│ • order_date (time) │
│ • status (string) │
│ • customer_id (number) │
└─────────────────────────────────────────────────┘

Prerequisites

Before creating a cube, you need:

  1. A Olytix Core project initialized (olytix-core init)
  2. A data source configured
  3. A model that transforms your raw data

If you don't have these yet, complete the Quick Start first.

Step 1: Understand Your Data

Let's say you have an orders table with this structure:

-- Your transformed model: fct_orders
SELECT
order_id,
customer_id,
product_id,
order_date,
quantity,
unit_price,
discount_amount,
total_amount,
status,
region
FROM transformed_orders

Step 2: Create the Cube File

Create cubes/orders.yml:

cubes:
- name: orders
# The SQL that defines the cube's base data
sql: "SELECT * FROM {{ ref('fct_orders') }}"

# Human-readable description
description: "Order transactions for e-commerce analytics"

# Metadata for governance
meta:
owner: analytics-team
certified: true
refresh_frequency: hourly

Step 3: Define Measures

Measures are the numeric values you want to aggregate. Add them to your cube:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
description: "Order transactions for e-commerce analytics"

measures:
# Simple count of all orders
- name: count
type: count
description: "Total number of orders"

# Sum of order amounts
- name: total_revenue
type: sum
sql: total_amount
format: currency
description: "Total revenue from all orders"

# Average order value
- name: avg_order_value
type: avg
sql: total_amount
format: currency
description: "Average order amount"

# Count of distinct customers
- name: unique_customers
type: count_distinct
sql: customer_id
description: "Number of unique customers"

# Total units sold
- name: total_quantity
type: sum
sql: quantity
description: "Total units sold"

# Calculated measure (gross margin)
- name: gross_revenue
type: sum
sql: "total_amount - discount_amount"
format: currency
description: "Revenue after discounts"

Measure Types

TypeDescriptionExample
countCount of rowsTotal orders
count_distinctCount of unique valuesUnique customers
sumSum of valuesTotal revenue
avgAverage of valuesAverage order value
minMinimum valueFirst order date
maxMaximum valueLargest order
numberCustom SQL expressionCalculated field

Step 4: Define Dimensions

Dimensions are the attributes you use to slice and filter data:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
description: "Order transactions for e-commerce analytics"

measures:
# ... measures from above

dimensions:
# Primary key (required)
- name: order_id
type: number
sql: order_id
primary_key: true
description: "Unique order identifier"

# Time dimension with granularities
- name: order_date
type: time
sql: order_date
granularities:
- day
- week
- month
- quarter
- year
description: "Date the order was placed"

# String dimension
- name: status
type: string
sql: status
description: "Order status (pending, completed, cancelled)"

# String dimension with case transformation
- name: region
type: string
sql: "UPPER(region)"
description: "Geographic region"

# Number dimension
- name: customer_id
type: number
sql: customer_id
description: "Customer identifier"

# Calculated dimension
- name: order_size
type: string
sql: |
CASE
WHEN total_amount < 50 THEN 'Small'
WHEN total_amount < 200 THEN 'Medium'
ELSE 'Large'
END
description: "Order size category"

# Boolean dimension
- name: has_discount
type: boolean
sql: "discount_amount > 0"
description: "Whether the order has a discount"

Dimension Types

TypeDescriptionExample
stringText valuesStatus, region, category
numberNumeric valuesCustomer ID, quantity
timeDate/timestampOrder date, created_at
booleanTrue/falseIs active, has discount
geoGeographic dataLocation, coordinates

Step 5: Compile and Validate

Compile your project to validate the cube definition:

olytix-core compile

Expected output:

Compiling project...
✓ Loaded 1 source with 1 table
✓ Compiled 1 model
✓ Registered 1 cube with 6 measures and 7 dimensions
✓ Built column-level lineage
✓ Generated manifest at target/manifest.json

Compilation completed in 0.38s

Validation Errors

If there are issues, you'll see helpful error messages:

Error in cubes/orders.yml:
Cube 'orders' has invalid measure 'total_revenue':
- SQL column 'total_amountt' not found in base query
Did you mean 'total_amount'?

Step 6: Test Your Cube

Start the API server and test your cube:

olytix-core serve

Query using curl:

curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"measures": ["orders.total_revenue", "orders.count"],
"dimensions": ["orders.status"]
}'

Response:

{
"data": [
{ "orders.status": "completed", "orders.total_revenue": 125000.00, "orders.count": 1523 },
{ "orders.status": "pending", "orders.total_revenue": 15000.00, "orders.count": 187 },
{ "orders.status": "cancelled", "orders.total_revenue": 0, "orders.count": 45 }
],
"query": {
"sql": "SELECT status, SUM(total_amount) AS total_revenue, COUNT(*) AS count FROM fct_orders GROUP BY status",
"duration_ms": 23
}
}

Complete Cube Example

Here's the full cubes/orders.yml:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
description: "Order transactions for e-commerce analytics"

meta:
owner: analytics-team
certified: true
refresh_frequency: hourly

measures:
- name: count
type: count
description: "Total number of orders"

- name: total_revenue
type: sum
sql: total_amount
format: currency
description: "Total revenue from all orders"

- name: avg_order_value
type: avg
sql: total_amount
format: currency
description: "Average order amount"

- name: unique_customers
type: count_distinct
sql: customer_id
description: "Number of unique customers"

- name: total_quantity
type: sum
sql: quantity
description: "Total units sold"

- name: gross_revenue
type: sum
sql: "total_amount - discount_amount"
format: currency
description: "Revenue after discounts"

dimensions:
- name: order_id
type: number
sql: order_id
primary_key: true
description: "Unique order identifier"

- name: order_date
type: time
sql: order_date
granularities:
- day
- week
- month
- quarter
- year
description: "Date the order was placed"

- name: status
type: string
sql: status
description: "Order status"

- name: region
type: string
sql: "UPPER(region)"
description: "Geographic region"

- name: customer_id
type: number
sql: customer_id
description: "Customer identifier"

- name: order_size
type: string
sql: |
CASE
WHEN total_amount < 50 THEN 'Small'
WHEN total_amount < 200 THEN 'Medium'
ELSE 'Large'
END
description: "Order size category"

- name: has_discount
type: boolean
sql: "discount_amount > 0"
description: "Whether the order has a discount"

Best Practices

Naming Conventions

  • Use snake_case for all names
  • Use descriptive, business-friendly names
  • Prefix measures with action: total_, avg_, count_

Documentation

  • Always include descriptions
  • Add meta for governance information
  • Document expected values for string dimensions

Performance

  • Mark primary keys for efficient joins
  • Use appropriate granularities for time dimensions
  • Consider pre-aggregations for large datasets

Next Steps

Now that you've created your first cube:

  1. Run queries against your cube →
  2. Add joins to connect cubes →
  3. Create metrics from measures →
  4. Optimize with pre-aggregations →