Skip to main content

Query Optimization

For Data Analysts

Optimizing query performance is essential for responsive analytics. This page covers pre-aggregations, caching strategies, and query design patterns that improve execution speed.

Optimization Overview

Olytix Core provides several optimization layers:

┌─────────────────────────────────────────────────────────────────┐
│ Query Request │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Result Cache │ │
│ │ Cached query results with TTL │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ cache miss │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Pre-aggregation Matching │ │
│ │ Route to pre-computed tables if available │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ no match │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Query Planner │ │
│ │ Optimized SQL generation │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Warehouse Execution │ │
│ │ Execute against data warehouse │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘

Pre-aggregations

Pre-aggregations are pre-computed summary tables that dramatically speed up common queries by avoiding full table scans.

How Pre-aggregations Work

  1. Olytix Core analyzes your cube definitions for pre-aggregation configurations
  2. Builds materialized tables with aggregated data
  3. Automatically routes matching queries to pre-aggregation tables
  4. Refreshes data on schedule or on-demand

Defining Pre-aggregations

Add pre-aggregations to your cube definition:

cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"

measures:
- name: total_revenue
type: sum
sql: total_amount

- name: order_count
type: count

dimensions:
- name: region
type: string
sql: region

- name: order_date
type: time
sql: order_date

pre_aggregations:
- name: orders_by_region_daily
measures:
- total_revenue
- order_count
dimensions:
- region
time_dimension: order_date
granularity: day
refresh_key:
every: "1 hour"

Pre-aggregation Types

Rollup

Aggregates data by specified dimensions and time granularity:

pre_aggregations:
- name: monthly_region_rollup
type: rollup
measures:
- total_revenue
- order_count
dimensions:
- region
- status
time_dimension: order_date
granularity: month

Original SQL

Stores the original cube SQL result for queries without aggregation:

pre_aggregations:
- name: recent_orders
type: original_sql
partition_granularity: day
time_dimension: order_date
refresh_key:
every: "30 minutes"

Rollup Join

Pre-aggregates data across joined cubes:

pre_aggregations:
- name: orders_with_customers
type: rollup_join
measures:
- orders.total_revenue
dimensions:
- orders.region
- customers.segment
rollups:
- orders.orders_by_region_daily
- customers.customers_by_segment

Pre-aggregation Matching

Olytix Core automatically matches queries to pre-aggregations when:

  1. All requested measures are in the pre-aggregation
  2. All requested dimensions are in the pre-aggregation
  3. Time granularity is equal or coarser than pre-aggregation
  4. Pre-aggregation is fresh (not stale)

Query that matches:

{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.region"],
"time_dimensions": [
{
"dimension": "orders.order_date",
"granularity": "month"
}
]
}

This matches orders_by_region_daily because:

  • total_revenue is included
  • region is included
  • month is coarser than day (can aggregate daily to monthly)

Checking Pre-aggregation Usage

Use the explain endpoint to verify pre-aggregation routing:

curl -X POST http://localhost:8000/api/v1/query/explain \
-H "Content-Type: application/json" \
-d '{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.region"]
}'

Response:

{
"cubes": ["orders"],
"uses_pre_aggregation": true,
"pre_aggregation": "orders_by_region_daily",
"steps": [
{
"type": "pre_aggregation",
"description": "Use pre-aggregation: orders_by_region_daily"
}
]
}

Managing Pre-aggregations

Build Pre-aggregations

# Build all pre-aggregations for a cube
curl -X POST http://localhost:8000/api/v1/preaggregation/build \
-H "Content-Type: application/json" \
-d '{
"cube_name": "orders",
"force": false
}'

Build Specific Pre-aggregation

curl -X POST http://localhost:8000/api/v1/preaggregation/build \
-H "Content-Type: application/json" \
-d '{
"cube_name": "orders",
"preagg_name": "orders_by_region_daily",
"force": true
}'

Check Pre-aggregation Status

curl http://localhost:8000/api/v1/preaggregation/status/orders/orders_by_region_daily

Response:

{
"cube_name": "orders",
"preagg_name": "orders_by_region_daily",
"status": "ready",
"last_built_at": "2024-01-20T10:00:00Z",
"row_count": 50000,
"size_bytes": 2500000,
"is_stale": false
}

Invalidate Pre-aggregation

Force rebuild on next query:

curl -X POST http://localhost:8000/api/v1/preaggregation/invalidate \
-H "Content-Type: application/json" \
-d '{
"cube_name": "orders",
"preagg_name": "orders_by_region_daily"
}'

Refresh Strategies

Scheduled Refresh

Refresh on a fixed schedule:

pre_aggregations:
- name: daily_summary
measures: [total_revenue]
dimensions: [region]
time_dimension: order_date
granularity: day
refresh_key:
every: "1 hour"

SQL-based Refresh

Refresh when source data changes:

pre_aggregations:
- name: orders_summary
measures: [total_revenue]
dimensions: [region]
refresh_key:
sql: "SELECT MAX(updated_at) FROM fct_orders"

Incremental Refresh

Only process new data:

pre_aggregations:
- name: orders_daily
measures: [total_revenue]
dimensions: [region]
time_dimension: order_date
granularity: day
partition_granularity: month
build_range_start: "2024-01-01"
build_range_end: "now"
incremental: true
update_window: "7 days"

Result Caching

Olytix Core caches query results to serve identical queries instantly.

Cache Behavior

  • Queries with identical parameters return cached results
  • Cache TTL is configurable (default: 5 minutes)
  • Cache is automatically invalidated when pre-aggregations rebuild
  • Different users may have different cache entries (due to security context)

Cache Status in Response

{
"data": [...],
"execution_time_ms": 2,
"cached": true,
"cache_key": "q_abc123"
}

Bypassing Cache

Force fresh execution:

curl -X POST "http://localhost:8000/api/v1/query?use_cache=false" \
-H "Content-Type: application/json" \
-d '{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.region"]
}'

Cache Configuration

Set cache TTL in your configuration:

# olytix-core_project.yml
cache:
default_ttl_seconds: 300
max_entries: 10000

Query Design Patterns

Limit Dimensions

Fewer dimensions = faster queries:

// Slower: Many dimensions
{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.region", "orders.status", "orders.payment_method", "orders.shipping_method"]
}

// Faster: Focused dimensions
{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.region"]
}

Filter Early

Apply filters to reduce data volume:

{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.customer_id"],
"filters": [
{
"member": "orders.order_date",
"operator": "inDateRange",
"values": ["2024-01-01", "2024-01-31"]
}
]
}

Use Appropriate Granularity

Match granularity to analysis needs:

// Daily granularity for detailed analysis
{
"time_dimensions": [
{ "dimension": "orders.order_date", "granularity": "day" }
]
}

// Monthly granularity for trend analysis (faster)
{
"time_dimensions": [
{ "dimension": "orders.order_date", "granularity": "month" }
]
}

Pagination for Large Results

Avoid loading massive result sets:

{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.customer_id"],
"order": [{ "member": "orders.total_revenue", "direction": "desc" }],
"limit": 100,
"offset": 0
}

Use Async for Heavy Queries

Route large queries to async execution:

curl -X POST http://localhost:8000/api/v1/query/async \
-H "Content-Type: application/json" \
-d '{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.customer_id"],
"limit": 500000
}'

Cost Estimation

Before running expensive queries, estimate the cost:

curl -X POST http://localhost:8000/api/v1/query/estimate \
-H "Content-Type: application/json" \
-d '{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.customer_id", "orders.order_date.day"]
}'

Response:

{
"estimated_rows": 500000,
"estimated_bytes": 50000000,
"estimated_cost": 500.0,
"estimated_time_ms": 5000,
"uses_pre_aggregation": false,
"execution_location": "warehouse"
}

Use this to:

  • Warn users before expensive queries
  • Automatically route to async execution
  • Plan pre-aggregation coverage

Monitoring Query Performance

Query Metrics

Track query performance over time:

curl http://localhost:8000/api/v1/metrics/query

Response:

{
"total_queries": 15420,
"cache_hit_rate": 0.72,
"avg_execution_time_ms": 145,
"p95_execution_time_ms": 890,
"pre_aggregation_hit_rate": 0.65,
"queries_by_cube": {
"orders": 8540,
"customers": 4200,
"products": 2680
}
}

Pre-aggregation Metrics

Monitor pre-aggregation effectiveness:

curl http://localhost:8000/api/v1/preaggregation/metrics

Response:

{
"total_preaggregations": 12,
"ready_count": 10,
"building_count": 1,
"stale_count": 1,
"total_size_bytes": 125000000,
"cache_hit_rate": 0.78,
"avg_build_time_seconds": 45
}

Optimization Checklist

Use this checklist to optimize your queries:

  • Pre-aggregations defined for common query patterns
  • Time dimension granularity matches pre-aggregations
  • Filters applied to reduce data volume
  • Pagination used for large result sets
  • Cache enabled for repeated queries
  • Async execution used for heavy queries
  • Query cost estimated before execution
  • Performance metrics monitored regularly

Pre-aggregation Design Tips

  1. Cover common queries - Analyze query logs to identify patterns
  2. Match dashboard granularity - Pre-aggregate at the granularity users need
  3. Include filter dimensions - Add dimensions commonly used in filters
  4. Balance storage vs speed - More granular = larger tables
  5. Schedule off-peak refreshes - Avoid contention with user queries
  6. Monitor staleness - Set appropriate refresh intervals

Troubleshooting Slow Queries

Query Not Using Pre-aggregation

Check why with explain:

curl -X POST http://localhost:8000/api/v1/query/explain \
-H "Content-Type: application/json" \
-d '{"measures": ["orders.total_revenue"], "dimensions": ["orders.customer_id"]}'

Common reasons:

  • Dimension not in pre-aggregation
  • Granularity too fine
  • Pre-aggregation is stale or building
  • Multiple cubes without rollup_join

High Cache Miss Rate

Causes and solutions:

  • High query variety: Standardize common queries
  • Short TTL: Increase cache duration
  • Unique parameters: Remove unnecessary variation
  • Security context: Expected for multi-tenant

Pre-aggregation Build Failures

Check build status:

curl http://localhost:8000/api/v1/preaggregation/status/orders/daily_summary

Common issues:

  • Source data errors
  • Insufficient warehouse resources
  • Invalid SQL expressions
  • Partition conflicts

Next Steps