Joins

For Data Analysts

Joins define relationships between cubes, enabling queries that combine measures and dimensions from multiple cubes. Olytix Core supports standard SQL join types along with semantic relationship types.

What is a Cube Join?

A join connects two cubes through a relationship, allowing you to:

Query measures from one cube with dimensions from another
Build a connected semantic model across your data
Maintain referential integrity between entities

Cube Joins & Relationships

Define relationships between cubes for multi-cube queries.

🧊orderscube

PKorder_idstring

FKcustomer_idstring

order_datetime

order_amountnumber

statusstring

N―――1

Join Type

belongs_to

Many orders belong to one customer

🧊customerscube

PKcustomer_idstring

customer_namestring

emailstring

created_attime

segmentstring

Legend

Orders Cube

Customers Cube

Join Relationship

Primary/Foreign Key

Join Types

Olytix Core supports the following join types:

SQL Join Types

Type	Description	Use Case
`left`	Include all rows from source cube	Most common for analytics
`inner`	Only matching rows from both cubes	When relationship is required
`full`	All rows from both cubes	Complete data union
`right`	All rows from target cube	Rarely used

Semantic Relationship Types

Type	SQL Equivalent	Cardinality	Example
`belongs_to`	LEFT JOIN	Many-to-one	Orders belong to customers
`has_many`	LEFT JOIN	One-to-many	Customers have many orders
`has_one`	LEFT JOIN	One-to-one	Order has one shipment

Basic Join Definition

cubes:
  - name: orders
    sql: "SELECT * FROM {{ ref('fct_orders') }}"

    joins:
      - name: customers
        sql: "{orders}.customer_id = {customers}.customer_id"
        relationship: belongs_to

Join Configuration Options

Each join supports these properties:

joins:
  - name: customers              # Target cube name
    sql: "{orders}.customer_id = {customers}.customer_id"  # ON clause
    relationship: belongs_to     # Join type or relationship
    description: "Link to customer who placed the order"

LEFT JOIN

Left joins include all rows from the source cube, with NULL values for unmatched target cube columns:

cubes:
  - name: orders
    sql: "SELECT * FROM {{ ref('fct_orders') }}"

    joins:
      - name: customers
        sql: "{orders}.customer_id = {customers}.customer_id"
        relationship: left
        description: "Customer information (NULL if customer deleted)"

Generated SQL:

SELECT ...
FROM fct_orders AS orders
LEFT JOIN dim_customers AS customers
  ON orders.customer_id = customers.customer_id

INNER JOIN

Inner joins only include rows that have matches in both cubes:

cubes:
  - name: orders
    sql: "SELECT * FROM {{ ref('fct_orders') }}"

    joins:
      - name: customers
        sql: "{orders}.customer_id = {customers}.customer_id"
        relationship: inner
        description: "Only orders with valid customers"

Generated SQL:

SELECT ...
FROM fct_orders AS orders
INNER JOIN dim_customers AS customers
  ON orders.customer_id = customers.customer_id

Use INNER JOIN when:

The relationship is mandatory
You want to filter out orphan records
Data integrity requires matching records

FULL JOIN

Full joins include all rows from both cubes:

cubes:
  - name: orders
    sql: "SELECT * FROM {{ ref('fct_orders') }}"

    joins:
      - name: returns
        sql: "{orders}.order_id = {returns}.order_id"
        relationship: full
        description: "All orders and all returns"

Generated SQL:

SELECT ...
FROM fct_orders AS orders
FULL OUTER JOIN fct_returns AS returns
  ON orders.order_id = returns.order_id

Use FULL JOIN when:

Comparing two related datasets
Finding unmatched records on both sides
Creating reconciliation reports

Semantic Relationships

belongs_to (Many-to-One)

Use when many records in the source cube relate to one record in the target:

cubes:
  - name: orders
    sql: "SELECT * FROM {{ ref('fct_orders') }}"

    joins:
      # Many orders belong to one customer
      - name: customers
        sql: "{orders}.customer_id = {customers}.customer_id"
        relationship: belongs_to
        description: "Customer who placed the order"

      # Many orders belong to one product
      - name: products
        sql: "{orders}.product_id = {products}.product_id"
        relationship: belongs_to
        description: "Product that was ordered"

has_many (One-to-Many)

Use when one record in the source cube relates to many records in the target:

cubes:
  - name: customers
    sql: "SELECT * FROM {{ ref('dim_customers') }}"

    joins:
      # One customer has many orders
      - name: orders
        sql: "{customers}.customer_id = {orders}.customer_id"
        relationship: has_many
        description: "Orders placed by this customer"

      # One customer has many support tickets
      - name: support_tickets
        sql: "{customers}.customer_id = {support_tickets}.customer_id"
        relationship: has_many
        description: "Support tickets from this customer"

has_one (One-to-One)

Use when one record in the source cube relates to exactly one record in the target:

cubes:
  - name: orders
    sql: "SELECT * FROM {{ ref('fct_orders') }}"

    joins:
      # One order has one shipment
      - name: shipments
        sql: "{orders}.order_id = {shipments}.order_id"
        relationship: has_one
        description: "Shipment for this order"

      # One order has one invoice
      - name: invoices
        sql: "{orders}.order_id = {invoices}.order_id"
        relationship: has_one
        description: "Invoice for this order"

Complex Join Conditions

Multi-Column Joins

Join on multiple columns:

joins:
  - name: inventory
    sql: |
      {orders}.product_id = {inventory}.product_id
      AND {orders}.warehouse_id = {inventory}.warehouse_id
    relationship: belongs_to
    description: "Inventory record for product at warehouse"

Joins with Expressions

Include expressions in join conditions:

joins:
  - name: price_history
    sql: |
      {orders}.product_id = {price_history}.product_id
      AND {orders}.order_date >= {price_history}.effective_from
      AND {orders}.order_date < COALESCE({price_history}.effective_to, '9999-12-31')
    relationship: belongs_to
    description: "Price effective at time of order"

Self-Referential Joins

Join a cube to itself:

cubes:
  - name: employees
    sql: "SELECT * FROM {{ ref('dim_employees') }}"

    joins:
      - name: manager
        sql: "{employees}.manager_id = {manager}.employee_id"
        relationship: belongs_to
        description: "Direct manager of this employee"

Join Chains

Olytix Core supports transitive joins through multiple cubes:

cubes:
  - name: order_items
    sql: "SELECT * FROM {{ ref('fct_order_items') }}"

    joins:
      - name: orders
        sql: "{order_items}.order_id = {orders}.order_id"
        relationship: belongs_to

  - name: orders
    sql: "SELECT * FROM {{ ref('fct_orders') }}"

    joins:
      - name: customers
        sql: "{orders}.customer_id = {customers}.customer_id"
        relationship: belongs_to

This allows querying:

{
  "measures": ["order_items.total_quantity"],
  "dimensions": ["customers.region"]
}

Olytix Core automatically resolves the join path: order_items -> orders -> customers

Complete Example

Here is a comprehensive example with multiple cube relationships:

# cubes/orders.yml
cubes:
  - name: orders
    sql: "SELECT * FROM {{ ref('fct_orders') }}"
    description: "E-commerce order transactions"

    measures:
      - name: count
        type: count
        description: "Total number of orders"

      - name: total_revenue
        type: sum
        sql: total_amount
        format: currency
        description: "Total revenue"

    dimensions:
      - name: order_id
        type: number
        sql: order_id
        primary_key: true

      - name: order_date
        type: time
        sql: order_date
        granularities: [day, week, month, quarter, year]

      - name: status
        type: string
        sql: status

    joins:
      - name: customers
        sql: "{orders}.customer_id = {customers}.customer_id"
        relationship: belongs_to
        description: "Customer who placed the order"

      - name: products
        sql: "{orders}.product_id = {products}.product_id"
        relationship: belongs_to
        description: "Primary product in the order"

      - name: shipments
        sql: "{orders}.order_id = {shipments}.order_id"
        relationship: has_one
        description: "Shipment details"

# cubes/customers.yml
cubes:
  - name: customers
    sql: "SELECT * FROM {{ ref('dim_customers') }}"
    description: "Customer master data"

    measures:
      - name: count
        type: count
        description: "Total customers"

    dimensions:
      - name: customer_id
        type: number
        sql: customer_id
        primary_key: true

      - name: name
        type: string
        sql: name

      - name: email
        type: string
        sql: email

      - name: region
        type: string
        sql: region

      - name: created_at
        type: time
        sql: created_at
        granularities: [day, month, year]

    joins:
      - name: orders
        sql: "{customers}.customer_id = {orders}.customer_id"
        relationship: has_many
        description: "Orders placed by this customer"

# cubes/products.yml
cubes:
  - name: products
    sql: "SELECT * FROM {{ ref('dim_products') }}"
    description: "Product catalog"

    dimensions:
      - name: product_id
        type: number
        sql: product_id
        primary_key: true

      - name: name
        type: string
        sql: name

      - name: category
        type: string
        sql: category

      - name: price
        type: number
        sql: price

# cubes/shipments.yml
cubes:
  - name: shipments
    sql: "SELECT * FROM {{ ref('fct_shipments') }}"
    description: "Order shipment tracking"

    dimensions:
      - name: shipment_id
        type: number
        sql: shipment_id
        primary_key: true

      - name: carrier
        type: string
        sql: carrier

      - name: shipped_at
        type: time
        sql: shipped_at
        granularities: [day, week, month]

      - name: delivered_at
        type: time
        sql: delivered_at
        granularities: [day, week, month]

Querying with Joins

Once joins are defined, you can query across cubes:

{
  "measures": ["orders.total_revenue"],
  "dimensions": ["customers.region", "products.category"],
  "filters": [
    {
      "dimension": "orders.order_date",
      "operator": "gte",
      "values": ["2024-01-01"]
    }
  ]
}

Best Practices

Define Joins on the "Many" Side

For belongs_to relationships, define the join on the cube with many records:

# Good: Join defined on orders (many) pointing to customers (one)
cubes:
  - name: orders
    joins:
      - name: customers
        relationship: belongs_to

Use Primary Keys

Ensure both cubes have primary key dimensions for efficient joins:

dimensions:
  - name: customer_id
    type: number
    sql: customer_id
    primary_key: true

Document Relationships

Include descriptions explaining the relationship:

joins:
  - name: customers
    sql: "{orders}.customer_id = {customers}.customer_id"
    relationship: belongs_to
    description: |
      Links to the customer who placed the order.
      NULL if customer record has been deleted.

Avoid Circular Joins

Do not create circular join paths:

# Avoid this pattern
cubes:
  - name: orders
    joins:
      - name: customers  # orders -> customers

  - name: customers
    joins:
      - name: orders  # customers -> orders (creates cycle)

Instead, define the relationship from one direction only, or use separate cubes for different query patterns.

Consider Join Performance

For large tables, ensure join columns are indexed in your database:

-- In your database
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_product_id ON orders(product_id);

Next Steps

Pre-aggregations - Optimize multi-cube queries
Cube Fundamentals - Review cube basics
Measures - Define aggregations for joined queries

What is a Cube Join?​

Cube Joins & Relationships

Join Types​

SQL Join Types​

Semantic Relationship Types​

Basic Join Definition​

Join Configuration Options​

LEFT JOIN​

INNER JOIN​

FULL JOIN​

Semantic Relationships​

belongs_to (Many-to-One)​

has_many (One-to-Many)​

has_one (One-to-One)​

Complex Join Conditions​

Multi-Column Joins​

Joins with Expressions​

Self-Referential Joins​

Join Chains​

Complete Example​

Querying with Joins​

Best Practices​

Define Joins on the "Many" Side​

Use Primary Keys​

Document Relationships​

Avoid Circular Joins​

Consider Join Performance​

Next Steps​

What is a Cube Join?

Join Types

SQL Join Types

Semantic Relationship Types

Basic Join Definition

Join Configuration Options

LEFT JOIN

INNER JOIN

FULL JOIN

Semantic Relationships

belongs_to (Many-to-One)

has_many (One-to-Many)

has_one (One-to-One)

Complex Join Conditions

Multi-Column Joins

Joins with Expressions

Self-Referential Joins

Join Chains

Complete Example

Querying with Joins

Best Practices

Define Joins on the "Many" Side

Use Primary Keys

Document Relationships

Avoid Circular Joins

Consider Join Performance

Next Steps