Query Optimization

For Data Analysts

Optimizing query performance is essential for responsive analytics. This page covers pre-aggregations, caching strategies, and query design patterns that improve execution speed.

Optimization Overview

Olytix Core provides several optimization layers:

┌─────────────────────────────────────────────────────────────────┐
│                      Query Request                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   ┌──────────────────────────────────────────────────────────┐  │
│   │  Result Cache                                             │  │
│   │  Cached query results with TTL                            │  │
│   └──────────────────────────────────────────────────────────┘  │
│                            │                                     │
│                     cache miss                                   │
│                            ▼                                     │
│   ┌──────────────────────────────────────────────────────────┐  │
│   │  Pre-aggregation Matching                                 │  │
│   │  Route to pre-computed tables if available                │  │
│   └──────────────────────────────────────────────────────────┘  │
│                            │                                     │
│                    no match                                      │
│                            ▼                                     │
│   ┌──────────────────────────────────────────────────────────┐  │
│   │  Query Planner                                            │  │
│   │  Optimized SQL generation                                 │  │
│   └──────────────────────────────────────────────────────────┘  │
│                            │                                     │
│                            ▼                                     │
│   ┌──────────────────────────────────────────────────────────┐  │
│   │  Warehouse Execution                                      │  │
│   │  Execute against data warehouse                           │  │
│   └──────────────────────────────────────────────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Pre-aggregations

Pre-aggregations are pre-computed summary tables that dramatically speed up common queries by avoiding full table scans.

How Pre-aggregations Work

Olytix Core analyzes your cube definitions for pre-aggregation configurations
Builds materialized tables with aggregated data
Automatically routes matching queries to pre-aggregation tables
Refreshes data on schedule or on-demand

Defining Pre-aggregations

Add pre-aggregations to your cube definition:

cubes:
  - name: orders
    sql: "SELECT * FROM {{ ref('fct_orders') }}"

    measures:
      - name: total_revenue
        type: sum
        sql: total_amount

      - name: order_count
        type: count

    dimensions:
      - name: region
        type: string
        sql: region

      - name: order_date
        type: time
        sql: order_date

    pre_aggregations:
      - name: orders_by_region_daily
        measures:
          - total_revenue
          - order_count
        dimensions:
          - region
        time_dimension: order_date
        granularity: day
        refresh_key:
          every: "1 hour"

Pre-aggregation Types

Rollup

Aggregates data by specified dimensions and time granularity:

pre_aggregations:
  - name: monthly_region_rollup
    type: rollup
    measures:
      - total_revenue
      - order_count
    dimensions:
      - region
      - status
    time_dimension: order_date
    granularity: month

Original SQL

Stores the original cube SQL result for queries without aggregation:

pre_aggregations:
  - name: recent_orders
    type: original_sql
    partition_granularity: day
    time_dimension: order_date
    refresh_key:
      every: "30 minutes"

Rollup Join

Pre-aggregates data across joined cubes:

pre_aggregations:
  - name: orders_with_customers
    type: rollup_join
    measures:
      - orders.total_revenue
    dimensions:
      - orders.region
      - customers.segment
    rollups:
      - orders.orders_by_region_daily
      - customers.customers_by_segment

Pre-aggregation Matching

Olytix Core automatically matches queries to pre-aggregations when:

All requested measures are in the pre-aggregation
All requested dimensions are in the pre-aggregation
Time granularity is equal or coarser than pre-aggregation
Pre-aggregation is fresh (not stale)

Query that matches:

{
  "measures": ["orders.total_revenue"],
  "dimensions": ["orders.region"],
  "time_dimensions": [
    {
      "dimension": "orders.order_date",
      "granularity": "month"
    }
  ]
}

This matches orders_by_region_daily because:

total_revenue is included
region is included
month is coarser than day (can aggregate daily to monthly)

Checking Pre-aggregation Usage

Use the explain endpoint to verify pre-aggregation routing:

curl -X POST http://localhost:8000/api/v1/query/explain \
  -H "Content-Type: application/json" \
  -d '{
    "measures": ["orders.total_revenue"],
    "dimensions": ["orders.region"]
  }'

Response:

{
  "cubes": ["orders"],
  "uses_pre_aggregation": true,
  "pre_aggregation": "orders_by_region_daily",
  "steps": [
    {
      "type": "pre_aggregation",
      "description": "Use pre-aggregation: orders_by_region_daily"
    }
  ]
}

Managing Pre-aggregations

Build Pre-aggregations

# Build all pre-aggregations for a cube
curl -X POST http://localhost:8000/api/v1/preaggregation/build \
  -H "Content-Type: application/json" \
  -d '{
    "cube_name": "orders",
    "force": false
  }'

Build Specific Pre-aggregation

curl -X POST http://localhost:8000/api/v1/preaggregation/build \
  -H "Content-Type: application/json" \
  -d '{
    "cube_name": "orders",
    "preagg_name": "orders_by_region_daily",
    "force": true
  }'

Check Pre-aggregation Status

curl http://localhost:8000/api/v1/preaggregation/status/orders/orders_by_region_daily

Response:

{
  "cube_name": "orders",
  "preagg_name": "orders_by_region_daily",
  "status": "ready",
  "last_built_at": "2024-01-20T10:00:00Z",
  "row_count": 50000,
  "size_bytes": 2500000,
  "is_stale": false
}

Invalidate Pre-aggregation

Force rebuild on next query:

curl -X POST http://localhost:8000/api/v1/preaggregation/invalidate \
  -H "Content-Type: application/json" \
  -d '{
    "cube_name": "orders",
    "preagg_name": "orders_by_region_daily"
  }'

Refresh Strategies

Scheduled Refresh

Refresh on a fixed schedule:

pre_aggregations:
  - name: daily_summary
    measures: [total_revenue]
    dimensions: [region]
    time_dimension: order_date
    granularity: day
    refresh_key:
      every: "1 hour"

SQL-based Refresh

Refresh when source data changes:

pre_aggregations:
  - name: orders_summary
    measures: [total_revenue]
    dimensions: [region]
    refresh_key:
      sql: "SELECT MAX(updated_at) FROM fct_orders"

Incremental Refresh

Only process new data:

pre_aggregations:
  - name: orders_daily
    measures: [total_revenue]
    dimensions: [region]
    time_dimension: order_date
    granularity: day
    partition_granularity: month
    build_range_start: "2024-01-01"
    build_range_end: "now"
    incremental: true
    update_window: "7 days"

Result Caching

Olytix Core caches query results to serve identical queries instantly.

Cache Behavior

Queries with identical parameters return cached results
Cache TTL is configurable (default: 5 minutes)
Cache is automatically invalidated when pre-aggregations rebuild
Different users may have different cache entries (due to security context)

Cache Status in Response

{
  "data": [...],
  "execution_time_ms": 2,
  "cached": true,
  "cache_key": "q_abc123"
}

Bypassing Cache

Force fresh execution:

curl -X POST "http://localhost:8000/api/v1/query?use_cache=false" \
  -H "Content-Type: application/json" \
  -d '{
    "measures": ["orders.total_revenue"],
    "dimensions": ["orders.region"]
  }'

Cache Configuration

Set cache TTL in your configuration:

# olytix-core_project.yml
cache:
  default_ttl_seconds: 300
  max_entries: 10000

Query Design Patterns

Limit Dimensions

Fewer dimensions = faster queries:

// Slower: Many dimensions
{
  "measures": ["orders.total_revenue"],
  "dimensions": ["orders.region", "orders.status", "orders.payment_method", "orders.shipping_method"]
}

// Faster: Focused dimensions
{
  "measures": ["orders.total_revenue"],
  "dimensions": ["orders.region"]
}

Filter Early

Apply filters to reduce data volume:

{
  "measures": ["orders.total_revenue"],
  "dimensions": ["orders.customer_id"],
  "filters": [
    {
      "member": "orders.order_date",
      "operator": "inDateRange",
      "values": ["2024-01-01", "2024-01-31"]
    }
  ]
}

Use Appropriate Granularity

Match granularity to analysis needs:

// Daily granularity for detailed analysis
{
  "time_dimensions": [
    { "dimension": "orders.order_date", "granularity": "day" }
  ]
}

// Monthly granularity for trend analysis (faster)
{
  "time_dimensions": [
    { "dimension": "orders.order_date", "granularity": "month" }
  ]
}

Pagination for Large Results

Avoid loading massive result sets:

{
  "measures": ["orders.total_revenue"],
  "dimensions": ["orders.customer_id"],
  "order": [{ "member": "orders.total_revenue", "direction": "desc" }],
  "limit": 100,
  "offset": 0
}

Use Async for Heavy Queries

Route large queries to async execution:

curl -X POST http://localhost:8000/api/v1/query/async \
  -H "Content-Type: application/json" \
  -d '{
    "measures": ["orders.total_revenue"],
    "dimensions": ["orders.customer_id"],
    "limit": 500000
  }'

Cost Estimation

Before running expensive queries, estimate the cost:

curl -X POST http://localhost:8000/api/v1/query/estimate \
  -H "Content-Type: application/json" \
  -d '{
    "measures": ["orders.total_revenue"],
    "dimensions": ["orders.customer_id", "orders.order_date.day"]
  }'

Response:

{
  "estimated_rows": 500000,
  "estimated_bytes": 50000000,
  "estimated_cost": 500.0,
  "estimated_time_ms": 5000,
  "uses_pre_aggregation": false,
  "execution_location": "warehouse"
}

Use this to:

Warn users before expensive queries
Automatically route to async execution
Plan pre-aggregation coverage

Monitoring Query Performance

Query Metrics

Track query performance over time:

curl http://localhost:8000/api/v1/metrics/query

Response:

{
  "total_queries": 15420,
  "cache_hit_rate": 0.72,
  "avg_execution_time_ms": 145,
  "p95_execution_time_ms": 890,
  "pre_aggregation_hit_rate": 0.65,
  "queries_by_cube": {
    "orders": 8540,
    "customers": 4200,
    "products": 2680
  }
}

Pre-aggregation Metrics

Monitor pre-aggregation effectiveness:

curl http://localhost:8000/api/v1/preaggregation/metrics

Response:

{
  "total_preaggregations": 12,
  "ready_count": 10,
  "building_count": 1,
  "stale_count": 1,
  "total_size_bytes": 125000000,
  "cache_hit_rate": 0.78,
  "avg_build_time_seconds": 45
}

Optimization Checklist

Use this checklist to optimize your queries:

Pre-aggregations defined for common query patterns
Time dimension granularity matches pre-aggregations
Filters applied to reduce data volume
Pagination used for large result sets
Cache enabled for repeated queries
Async execution used for heavy queries
Query cost estimated before execution
Performance metrics monitored regularly

Pre-aggregation Design Tips

Cover common queries - Analyze query logs to identify patterns
Match dashboard granularity - Pre-aggregate at the granularity users need
Include filter dimensions - Add dimensions commonly used in filters
Balance storage vs speed - More granular = larger tables
Schedule off-peak refreshes - Avoid contention with user queries
Monitor staleness - Set appropriate refresh intervals

Troubleshooting Slow Queries

Query Not Using Pre-aggregation

Check why with explain:

curl -X POST http://localhost:8000/api/v1/query/explain \
  -H "Content-Type: application/json" \
  -d '{"measures": ["orders.total_revenue"], "dimensions": ["orders.customer_id"]}'

Common reasons:

Dimension not in pre-aggregation
Granularity too fine
Pre-aggregation is stale or building
Multiple cubes without rollup_join

High Cache Miss Rate

Causes and solutions:

High query variety: Standardize common queries
Short TTL: Increase cache duration
Unique parameters: Remove unnecessary variation
Security context: Expected for multi-tenant

Pre-aggregation Build Failures

Check build status:

curl http://localhost:8000/api/v1/preaggregation/status/orders/daily_summary

Common issues:

Source data errors
Insufficient warehouse resources
Invalid SQL expressions
Partition conflicts

Optimization Overview​

Pre-aggregations​

How Pre-aggregations Work​

Defining Pre-aggregations​

Pre-aggregation Types​

Rollup​

Original SQL​

Rollup Join​

Pre-aggregation Matching​

Checking Pre-aggregation Usage​

Managing Pre-aggregations​

Build Pre-aggregations​

Build Specific Pre-aggregation​

Check Pre-aggregation Status​

Invalidate Pre-aggregation​

Refresh Strategies​

Scheduled Refresh​

SQL-based Refresh​

Incremental Refresh​

Result Caching​

Cache Behavior​

Cache Status in Response​

Bypassing Cache​

Cache Configuration​

Query Design Patterns​

Limit Dimensions​

Filter Early​

Use Appropriate Granularity​

Pagination for Large Results​

Use Async for Heavy Queries​

Cost Estimation​

Monitoring Query Performance​

Query Metrics​

Pre-aggregation Metrics​

Optimization Checklist​

Pre-aggregation Design Tips​

Troubleshooting Slow Queries​

Query Not Using Pre-aggregation​

High Cache Miss Rate​

Pre-aggregation Build Failures​

Next Steps​

Optimization Overview

Pre-aggregations

How Pre-aggregations Work

Defining Pre-aggregations

Pre-aggregation Types

Rollup

Original SQL

Rollup Join

Pre-aggregation Matching

Checking Pre-aggregation Usage

Managing Pre-aggregations

Build Pre-aggregations

Build Specific Pre-aggregation

Check Pre-aggregation Status

Invalidate Pre-aggregation

Refresh Strategies

Scheduled Refresh

SQL-based Refresh

Incremental Refresh

Result Caching

Cache Behavior

Cache Status in Response

Bypassing Cache

Cache Configuration

Query Design Patterns

Limit Dimensions

Filter Early

Use Appropriate Granularity

Pagination for Large Results

Use Async for Heavy Queries

Cost Estimation

Monitoring Query Performance

Query Metrics

Pre-aggregation Metrics

Optimization Checklist

Pre-aggregation Design Tips

Troubleshooting Slow Queries

Query Not Using Pre-aggregation

High Cache Miss Rate

Pre-aggregation Build Failures

Next Steps