Query Optimization
Optimizing query performance is essential for responsive analytics. This page covers pre-aggregations, caching strategies, and query design patterns that improve execution speed.
Optimization Overview
Olytix Core provides several optimization layers:
┌─────────────────────────────────────────────────────────────────┐
│ Query Request │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Result Cache │ │
│ │ Cached query results with TTL │ │
│ └──────── ──────────────────────────────────────────────────┘ │
│ │ │
│ cache miss │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Pre-aggregation Matching │ │
│ │ Route to pre-computed tables if available │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ no match │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Query Planner │ │
│ │ Optimized SQL generation │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Warehouse Execution │ │
│ │ Execute against data warehouse │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Pre-aggregations
Pre-aggregations are pre-computed summary tables that dramatically speed up common queries by avoiding full table scans.
How Pre-aggregations Work
- Olytix Core analyzes your cube definitions for pre-aggregation configurations
- Builds materialized tables with aggregated data
- Automatically routes matching queries to pre-aggregation tables
- Refreshes data on schedule or on-demand
Defining Pre-aggregations
Add pre-aggregations to your cube definition:
cubes:
- name: orders
sql: "SELECT * FROM {{ ref('fct_orders') }}"
measures:
- name: total_revenue
type: sum
sql: total_amount
- name: order_count
type: count
dimensions:
- name: region
type: string
sql: region
- name: order_date
type: time
sql: order_date
pre_aggregations:
- name: orders_by_region_daily
measures:
- total_revenue
- order_count
dimensions:
- region
time_dimension: order_date
granularity: day
refresh_key:
every: "1 hour"
Pre-aggregation Types
Rollup
Aggregates data by specified dimensions and time granularity:
pre_aggregations:
- name: monthly_region_rollup
type: rollup
measures:
- total_revenue
- order_count
dimensions:
- region
- status
time_dimension: order_date
granularity: month
Original SQL
Stores the original cube SQL result for queries without aggregation:
pre_aggregations:
- name: recent_orders
type: original_sql
partition_granularity: day
time_dimension: order_date
refresh_key:
every: "30 minutes"
Rollup Join
Pre-aggregates data across joined cubes:
pre_aggregations:
- name: orders_with_customers
type: rollup_join
measures:
- orders.total_revenue
dimensions:
- orders.region
- customers.segment
rollups:
- orders.orders_by_region_daily
- customers.customers_by_segment
Pre-aggregation Matching
Olytix Core automatically matches queries to pre-aggregations when:
- All requested measures are in the pre-aggregation
- All requested dimensions are in the pre-aggregation
- Time granularity is equal or coarser than pre-aggregation
- Pre-aggregation is fresh (not stale)
Query that matches:
{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.region"],
"time_dimensions": [
{
"dimension": "orders.order_date",
"granularity": "month"
}
]
}
This matches orders_by_region_daily because:
total_revenueis includedregionis includedmonthis coarser thanday(can aggregate daily to monthly)
Checking Pre-aggregation Usage
Use the explain endpoint to verify pre-aggregation routing:
curl -X POST http://localhost:8000/api/v1/query/explain \
-H "Content-Type: application/json" \
-d '{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.region"]
}'
Response:
{
"cubes": ["orders"],
"uses_pre_aggregation": true,
"pre_aggregation": "orders_by_region_daily",
"steps": [
{
"type": "pre_aggregation",
"description": "Use pre-aggregation: orders_by_region_daily"
}
]
}
Managing Pre-aggregations
Build Pre-aggregations
# Build all pre-aggregations for a cube
curl -X POST http://localhost:8000/api/v1/preaggregation/build \
-H "Content-Type: application/json" \
-d '{
"cube_name": "orders",
"force": false
}'
Build Specific Pre-aggregation
curl -X POST http://localhost:8000/api/v1/preaggregation/build \
-H "Content-Type: application/json" \
-d '{
"cube_name": "orders",
"preagg_name": "orders_by_region_daily",
"force": true
}'
Check Pre-aggregation Status
curl http://localhost:8000/api/v1/preaggregation/status/orders/orders_by_region_daily
Response:
{
"cube_name": "orders",
"preagg_name": "orders_by_region_daily",
"status": "ready",
"last_built_at": "2024-01-20T10:00:00Z",
"row_count": 50000,
"size_bytes": 2500000,
"is_stale": false
}
Invalidate Pre-aggregation
Force rebuild on next query:
curl -X POST http://localhost:8000/api/v1/preaggregation/invalidate \
-H "Content-Type: application/json" \
-d '{
"cube_name": "orders",
"preagg_name": "orders_by_region_daily"
}'
Refresh Strategies
Scheduled Refresh
Refresh on a fixed schedule:
pre_aggregations:
- name: daily_summary
measures: [total_revenue]
dimensions: [region]
time_dimension: order_date
granularity: day
refresh_key:
every: "1 hour"
SQL-based Refresh
Refresh when source data changes:
pre_aggregations:
- name: orders_summary
measures: [total_revenue]
dimensions: [region]
refresh_key:
sql: "SELECT MAX(updated_at) FROM fct_orders"
Incremental Refresh
Only process new data:
pre_aggregations:
- name: orders_daily
measures: [total_revenue]
dimensions: [region]
time_dimension: order_date
granularity: day
partition_granularity: month
build_range_start: "2024-01-01"
build_range_end: "now"
incremental: true
update_window: "7 days"
Result Caching
Olytix Core caches query results to serve identical queries instantly.
Cache Behavior
- Queries with identical parameters return cached results
- Cache TTL is configurable (default: 5 minutes)
- Cache is automatically invalidated when pre-aggregations rebuild
- Different users may have different cache entries (due to security context)
Cache Status in Response
{
"data": [...],
"execution_time_ms": 2,
"cached": true,
"cache_key": "q_abc123"
}
Bypassing Cache
Force fresh execution:
curl -X POST "http://localhost:8000/api/v1/query?use_cache=false" \
-H "Content-Type: application/json" \
-d '{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.region"]
}'
Cache Configuration
Set cache TTL in your configuration:
# olytix-core_project.yml
cache:
default_ttl_seconds: 300
max_entries: 10000
Query Design Patterns
Limit Dimensions
Fewer dimensions = faster queries:
// Slower: Many dimensions
{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.region", "orders.status", "orders.payment_method", "orders.shipping_method"]
}
// Faster: Focused dimensions
{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.region"]
}
Filter Early
Apply filters to reduce data volume:
{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.customer_id"],
"filters": [
{
"member": "orders.order_date",
"operator": "inDateRange",
"values": ["2024-01-01", "2024-01-31"]
}
]
}
Use Appropriate Granularity
Match granularity to analysis needs:
// Daily granularity for detailed analysis
{
"time_dimensions": [
{ "dimension": "orders.order_date", "granularity": "day" }
]
}
// Monthly granularity for trend analysis (faster)
{
"time_dimensions": [
{ "dimension": "orders.order_date", "granularity": "month" }
]
}
Pagination for Large Results
Avoid loading massive result sets:
{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.customer_id"],
"order": [{ "member": "orders.total_revenue", "direction": "desc" }],
"limit": 100,
"offset": 0
}
Use Async for Heavy Queries
Route large queries to async execution:
curl -X POST http://localhost:8000/api/v1/query/async \
-H "Content-Type: application/json" \
-d '{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.customer_id"],
"limit": 500000
}'
Cost Estimation
Before running expensive queries, estimate the cost:
curl -X POST http://localhost:8000/api/v1/query/estimate \
-H "Content-Type: application/json" \
-d '{
"measures": ["orders.total_revenue"],
"dimensions": ["orders.customer_id", "orders.order_date.day"]
}'
Response:
{
"estimated_rows": 500000,
"estimated_bytes": 50000000,
"estimated_cost": 500.0,
"estimated_time_ms": 5000,
"uses_pre_aggregation": false,
"execution_location": "warehouse"
}
Use this to:
- Warn users before expensive queries
- Automatically route to async execution
- Plan pre-aggregation coverage
Monitoring Query Performance
Query Metrics
Track query performance over time:
curl http://localhost:8000/api/v1/metrics/query
Response:
{
"total_queries": 15420,
"cache_hit_rate": 0.72,
"avg_execution_time_ms": 145,
"p95_execution_time_ms": 890,
"pre_aggregation_hit_rate": 0.65,
"queries_by_cube": {
"orders": 8540,
"customers": 4200,
"products": 2680
}
}
Pre-aggregation Metrics
Monitor pre-aggregation effectiveness:
curl http://localhost:8000/api/v1/preaggregation/metrics
Response:
{
"total_preaggregations": 12,
"ready_count": 10,
"building_count": 1,
"stale_count": 1,
"total_size_bytes": 125000000,
"cache_hit_rate": 0.78,
"avg_build_time_seconds": 45
}
Optimization Checklist
Use this checklist to optimize your queries:
- Pre-aggregations defined for common query patterns
- Time dimension granularity matches pre-aggregations
- Filters applied to reduce data volume
- Pagination used for large result sets
- Cache enabled for repeated queries
- Async execution used for heavy queries
- Query cost estimated before execution
- Performance metrics monitored regularly
Pre-aggregation Design Tips
- Cover common queries - Analyze query logs to identify patterns
- Match dashboard granularity - Pre-aggregate at the granularity users need
- Include filter dimensions - Add dimensions commonly used in filters
- Balance storage vs speed - More granular = larger tables
- Schedule off-peak refreshes - Avoid contention with user queries
- Monitor staleness - Set appropriate refresh intervals
Troubleshooting Slow Queries
Query Not Using Pre-aggregation
Check why with explain:
curl -X POST http://localhost:8000/api/v1/query/explain \
-H "Content-Type: application/json" \
-d '{"measures": ["orders.total_revenue"], "dimensions": ["orders.customer_id"]}'
Common reasons:
- Dimension not in pre-aggregation
- Granularity too fine
- Pre-aggregation is stale or building
- Multiple cubes without rollup_join
High Cache Miss Rate
Causes and solutions:
- High query variety: Standardize common queries
- Short TTL: Increase cache duration
- Unique parameters: Remove unnecessary variation
- Security context: Expected for multi-tenant
Pre-aggregation Build Failures
Check build status:
curl http://localhost:8000/api/v1/preaggregation/status/orders/daily_summary
Common issues:
- Source data errors
- Insufficient warehouse resources
- Invalid SQL expressions
- Partition conflicts