Skip to main content

Joins

For Data Analysts

Joins define relationships between cubes, enabling queries that combine measures and dimensions from multiple cubes. Olytix Core supports standard SQL join types along with semantic relationship types.

What is a Cube Join?

A join connects two cubes through a relationship, allowing you to:

  • Query measures from one cube with dimensions from another
  • Build a connected semantic model across your data
  • Maintain referential integrity between entities

Cube Joins & Relationships

Define relationships between cubes for multi-cube queries.

🧊orderscube
PKorder_idstring
FKcustomer_idstring
order_datetime
order_amountnumber
statusstring
N―――1
Join Type
belongs_to
Many orders belong to one customer
🧊customerscube
PKcustomer_idstring
customer_namestring
emailstring
created_attime
segmentstring
Legend
Orders Cube
Customers Cube
Join Relationship
Primary/Foreign Key

Join Types

Olytix Core supports the following join types:

SQL Join Types

TypeDescriptionUse Case
leftInclude all rows from source cubeMost common for analytics
innerOnly matching rows from both cubesWhen relationship is required
fullAll rows from both cubesComplete data union
rightAll rows from target cubeRarely used

Semantic Relationship Types

TypeSQL EquivalentCardinalityExample
belongs_toLEFT JOINMany-to-oneOrders belong to customers
has_manyLEFT JOINOne-to-manyCustomers have many orders
has_oneLEFT JOINOne-to-oneOrder has one shipment

Basic Join Definition

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

joins:
- name: customers
sql: "{orders}.customer_id = {customers}.customer_id"
relationship: belongs_to

Join Configuration Options

Each join supports these properties:

joins:
- name: customers # Target cube name
sql: "{orders}.customer_id = {customers}.customer_id" # ON clause
relationship: belongs_to # Join type or relationship
description: "Link to customer who placed the order"

LEFT JOIN

Left joins include all rows from the source cube, with NULL values for unmatched target cube columns:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

joins:
- name: customers
sql: "{orders}.customer_id = {customers}.customer_id"
relationship: left
description: "Customer information (NULL if customer deleted)"

Generated SQL:

SELECT ...
FROM fct_orders AS orders
LEFT JOIN dim_customers AS customers
ON orders.customer_id = customers.customer_id

INNER JOIN

Inner joins only include rows that have matches in both cubes:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

joins:
- name: customers
sql: "{orders}.customer_id = {customers}.customer_id"
relationship: inner
description: "Only orders with valid customers"

Generated SQL:

SELECT ...
FROM fct_orders AS orders
INNER JOIN dim_customers AS customers
ON orders.customer_id = customers.customer_id

Use INNER JOIN when:

  • The relationship is mandatory
  • You want to filter out orphan records
  • Data integrity requires matching records

FULL JOIN

Full joins include all rows from both cubes:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

joins:
- name: returns
sql: "{orders}.order_id = {returns}.order_id"
relationship: full
description: "All orders and all returns"

Generated SQL:

SELECT ...
FROM fct_orders AS orders
FULL OUTER JOIN fct_returns AS returns
ON orders.order_id = returns.order_id

Use FULL JOIN when:

  • Comparing two related datasets
  • Finding unmatched records on both sides
  • Creating reconciliation reports

Semantic Relationships

belongs_to (Many-to-One)

Use when many records in the source cube relate to one record in the target:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

joins:
# Many orders belong to one customer
- name: customers
sql: "{orders}.customer_id = {customers}.customer_id"
relationship: belongs_to
description: "Customer who placed the order"

# Many orders belong to one product
- name: products
sql: "{orders}.product_id = {products}.product_id"
relationship: belongs_to
description: "Product that was ordered"

has_many (One-to-Many)

Use when one record in the source cube relates to many records in the target:

cubes:
- name: customers
sql: "SELECT * FROM {{ ref('dim_customers') }}"

joins:
# One customer has many orders
- name: orders
sql: "{customers}.customer_id = {orders}.customer_id"
relationship: has_many
description: "Orders placed by this customer"

# One customer has many support tickets
- name: support_tickets
sql: "{customers}.customer_id = {support_tickets}.customer_id"
relationship: has_many
description: "Support tickets from this customer"

has_one (One-to-One)

Use when one record in the source cube relates to exactly one record in the target:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

joins:
# One order has one shipment
- name: shipments
sql: "{orders}.order_id = {shipments}.order_id"
relationship: has_one
description: "Shipment for this order"

# One order has one invoice
- name: invoices
sql: "{orders}.order_id = {invoices}.order_id"
relationship: has_one
description: "Invoice for this order"

Complex Join Conditions

Multi-Column Joins

Join on multiple columns:

joins:
- name: inventory
sql: |
{orders}.product_id = {inventory}.product_id
AND {orders}.warehouse_id = {inventory}.warehouse_id
relationship: belongs_to
description: "Inventory record for product at warehouse"

Joins with Expressions

Include expressions in join conditions:

joins:
- name: price_history
sql: |
{orders}.product_id = {price_history}.product_id
AND {orders}.order_date >= {price_history}.effective_from
AND {orders}.order_date < COALESCE({price_history}.effective_to, '9999-12-31')
relationship: belongs_to
description: "Price effective at time of order"

Self-Referential Joins

Join a cube to itself:

cubes:
- name: employees
sql: "SELECT * FROM {{ ref('dim_employees') }}"

joins:
- name: manager
sql: "{employees}.manager_id = {manager}.employee_id"
relationship: belongs_to
description: "Direct manager of this employee"

Join Chains

Olytix Core supports transitive joins through multiple cubes:

cubes:
- name: order_items
sql: "SELECT * FROM {{ ref('fct_order_items') }}"

joins:
- name: orders
sql: "{order_items}.order_id = {orders}.order_id"
relationship: belongs_to

- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

joins:
- name: customers
sql: "{orders}.customer_id = {customers}.customer_id"
relationship: belongs_to

This allows querying:

{
"measures": ["order_items.total_quantity"],
"dimensions": ["customers.region"]
}

Olytix Core automatically resolves the join path: order_items -> orders -> customers

Complete Example

Here is a comprehensive example with multiple cube relationships:

# cubes/orders.yml
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
description: "E-commerce order transactions"

measures:
- name: count
type: count
description: "Total number of orders"

- name: total_revenue
type: sum
sql: total_amount
format: currency
description: "Total revenue"

dimensions:
- name: order_id
type: number
sql: order_id
primary_key: true

- name: order_date
type: time
sql: order_date
granularities: [day, week, month, quarter, year]

- name: status
type: string
sql: status

joins:
- name: customers
sql: "{orders}.customer_id = {customers}.customer_id"
relationship: belongs_to
description: "Customer who placed the order"

- name: products
sql: "{orders}.product_id = {products}.product_id"
relationship: belongs_to
description: "Primary product in the order"

- name: shipments
sql: "{orders}.order_id = {shipments}.order_id"
relationship: has_one
description: "Shipment details"

# cubes/customers.yml
cubes:
- name: customers
sql: "SELECT * FROM {{ ref('dim_customers') }}"
description: "Customer master data"

measures:
- name: count
type: count
description: "Total customers"

dimensions:
- name: customer_id
type: number
sql: customer_id
primary_key: true

- name: name
type: string
sql: name

- name: email
type: string
sql: email

- name: region
type: string
sql: region

- name: created_at
type: time
sql: created_at
granularities: [day, month, year]

joins:
- name: orders
sql: "{customers}.customer_id = {orders}.customer_id"
relationship: has_many
description: "Orders placed by this customer"

# cubes/products.yml
cubes:
- name: products
sql: "SELECT * FROM {{ ref('dim_products') }}"
description: "Product catalog"

dimensions:
- name: product_id
type: number
sql: product_id
primary_key: true

- name: name
type: string
sql: name

- name: category
type: string
sql: category

- name: price
type: number
sql: price

# cubes/shipments.yml
cubes:
- name: shipments
sql: "SELECT * FROM {{ ref('fct_shipments') }}"
description: "Order shipment tracking"

dimensions:
- name: shipment_id
type: number
sql: shipment_id
primary_key: true

- name: carrier
type: string
sql: carrier

- name: shipped_at
type: time
sql: shipped_at
granularities: [day, week, month]

- name: delivered_at
type: time
sql: delivered_at
granularities: [day, week, month]

Querying with Joins

Once joins are defined, you can query across cubes:

{
"measures": ["orders.total_revenue"],
"dimensions": ["customers.region", "products.category"],
"filters": [
{
"dimension": "orders.order_date",
"operator": "gte",
"values": ["2024-01-01"]
}
]
}

Best Practices

Define Joins on the "Many" Side

For belongs_to relationships, define the join on the cube with many records:

# Good: Join defined on orders (many) pointing to customers (one)
cubes:
- name: orders
joins:
- name: customers
relationship: belongs_to

Use Primary Keys

Ensure both cubes have primary key dimensions for efficient joins:

dimensions:
- name: customer_id
type: number
sql: customer_id
primary_key: true

Document Relationships

Include descriptions explaining the relationship:

joins:
- name: customers
sql: "{orders}.customer_id = {customers}.customer_id"
relationship: belongs_to
description: |
Links to the customer who placed the order.
NULL if customer record has been deleted.

Avoid Circular Joins

Do not create circular join paths:

# Avoid this pattern
cubes:
- name: orders
joins:
- name: customers # orders -> customers

- name: customers
joins:
- name: orders # customers -> orders (creates cycle)

Instead, define the relationship from one direction only, or use separate cubes for different query patterns.

Consider Join Performance

For large tables, ensure join columns are indexed in your database:

-- In your database
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_product_id ON orders(product_id);

Next Steps