Query Cost Optimization
Olytix Core's query optimization system helps you understand and control query costs before execution, set guardrails to prevent runaway queries, and receive actionable recommendations to improve performance.
Overview
┌─────────────────────────────────────────────────────────────────────┐
│ QUERY OPTIMIZATION PIPELINE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌─────── ───────┐ │
│ │ QUERY │───▶│ COST │───▶│ GUARDRAIL │ │
│ │ ANALYSIS │ │ PREDICTION │ │ CHECK │ │
│ │ │ │ │ │ │ │
│ │ Parse query │ │ Estimate │ │ Cost limits │ │
│ │ Identify │ │ scan bytes │ │ Time limits │ │
│ │ patterns │ │ join costs │ │ Row limits │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ BUDGET │◀───│ EXECUTION │◀───│ OPTIMIZATION │ │
│ │ TRACKING │ │ DECISION │ │ ADVICE │ │
│ │ │ │ │ │ │ │
│ │ Usage stats │ │ Allow/Block │ │ Pre-agg use │ │
│ │ Alerts │ │ Warnings │ │ Filter hints │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Key Features
Cost Prediction
Before executing a query, understand its estimated cost:
| Cost Component | Description | Example |
|---|---|---|
| Scan Cost | Data bytes to be scanned | 2.5 GB |
| Join Cost | Cost of join operations | Medium |
| Aggregation Cost | Grouping and computation | Low |
| Sort Cost | Ordering operations | Low |
| Estimated Time | Predicted execution time | ~15 seconds |
| Estimated Cost | Dollar cost (for cloud warehouses) | $0.012 |
Pre-Aggregation Savings
See potential savings from pre-aggregations:
Cost Comparison:
├── Without Pre-Aggregation:
│ ├── Scan: 50 GB
│ ├── Estimated Time: 45s
│ └── Estimated Cost: $0.25
│
└── With Pre-Aggregation (matched):
├── Scan: 500 MB (99% reduction)
├── Estimated Time: 2s (95% faster)
└── Estimated Cost: $0.0025 (99% savings)
Query Guardrails
Set limits to prevent expensive queries:
| Guardrail | Description | Configurable By |
|---|---|---|
| Max Cost | Maximum dollar cost per query | Admin |
| Max Bytes | Maximum data to scan | Admin |
| Max Time | Query timeout | Admin/User |
| Max Rows | Maximum rows returned | User |
| Require Filter | Must have time/partition filter | Admin |
Budget Tracking
Monitor and control query spend:
- User budgets - Per-user spending limits
- Team budgets - Shared team allocations
- Project budgets - Project-level cost tracking
- Alerts - Notifications when approaching limits
Usage
Predict Query Cost
from olytix-core.optimization.service import OptimizationService
from olytix-core.optimization.cost.predictor import CostPredictor
service = OptimizationService()
# Predict cost before execution
prediction = await service.predict_cost(
query={
"measures": ["Orders.revenue", "Orders.count"],
"dimensions": ["Orders.region", "Orders.product_category"],
"timeDimensions": [{
"dimension": "Orders.created_at",
"dateRange": "Last 90 days"
}]
}
)
# Review prediction
print(f"Scan bytes: {prediction.scan_bytes_formatted}") # "2.5 GB"
print(f"Estimated time: {prediction.estimated_seconds}s")
print(f"Estimated cost: ${prediction.estimated_cost:.4f}")
print(f"Pre-agg available: {prediction.preagg_match}")
print(f"Savings with pre-agg: {prediction.preagg_savings_percent}%")
Cost Prediction Response
CostPrediction:
├── query_hash: "abc123"
├── scan_bytes: 2684354560 # 2.5 GB
├── scan_bytes_formatted: "2.5 GB"
├── estimated_seconds: 15.5
├── estimated_cost: 0.0125
│
├── cost_breakdown:
│ ├── scan_cost: 0.0100
│ ├── compute_cost: 0.0020
│ └── egress_cost: 0.0005
│
├── preagg_match:
│ ├── matched: true
│ ├── preagg_name: "orders_daily_by_region"
│ ├── scan_bytes_with_preagg: 52428800 # 50 MB
│ ├── savings_percent: 98.0
│ └── recommendation: "Query will automatically use pre-aggregation"
│
├── guardrail_status:
│ ├── within_limits: true
│ ├── cost_limit: 1.00
│ ├── time_limit: 300
│ └── warnings: []
│
└── optimization_hints: [
"Consider adding a partition filter for better performance",
"This query pattern could benefit from a new pre-aggregation"
]
Configure Guardrails
from olytix-core.optimization.guardrails.models import GuardrailConfig
# Set user-level guardrails
await service.set_user_guardrails(
user_id="analyst_123",
config=GuardrailConfig(
max_cost_per_query=1.00, # $1 max per query
max_bytes_scanned="10GB", # 10 GB max scan
max_execution_seconds=300, # 5 minute timeout
max_rows_returned=100000, # 100k rows max
require_time_filter=True, # Must have date filter
require_partition_filter=False
)
)
# Set team-level guardrails
await service.set_team_guardrails(
team_id="analytics_team",
config=GuardrailConfig(
max_cost_per_query=5.00,
daily_budget=100.00, # $100/day team budget
monthly_budget=2000.00, # $2000/month
alert_at_percent=80 # Alert at 80% usage
)
)
Check Query Against Guardrails
# Check before execution
check_result = await service.check_guardrails(
query=query,
user_id="analyst_123"
)
if not check_result.allowed:
print(f"Query blocked: {check_result.reason}")
print(f"Suggestions: {check_result.suggestions}")
else:
if check_result.warnings:
print(f"Warnings: {check_result.warnings}")
# Proceed with execution
Budget Tracking
# Check current usage
usage = await service.get_budget_usage(
user_id="analyst_123",
period="current_month"
)
print(f"Spent: ${usage.amount_spent:.2f}")
print(f"Budget: ${usage.budget_limit:.2f}")
print(f"Remaining: ${usage.remaining:.2f}")
print(f"Usage: {usage.percent_used}%")
print(f"Queries run: {usage.query_count}")
# Get usage breakdown
breakdown = await service.get_usage_breakdown(
user_id="analyst_123",
period="current_month",
group_by="cube"
)
# Returns spending per cube
for cube_usage in breakdown:
print(f"{cube_usage.cube_name}: ${cube_usage.amount:.2f}")
API Endpoints
Predict Cost
POST /api/v1/optimization/predict-cost
Content-Type: application/json
{
"query": {
"measures": ["Orders.revenue"],
"dimensions": ["Orders.region"],
"timeDimensions": [{
"dimension": "Orders.created_at",
"dateRange": "Last 30 days"
}]
},
"include_preagg_analysis": true
}
Check Guardrails
POST /api/v1/optimization/check-guardrails
Content-Type: application/json
{
"query": {...},
"user_id": "analyst_123"
}
Get Budget Usage
GET /api/v1/optimization/budget/usage?
user_id=analyst_123&
period=current_month
Set Guardrails
PUT /api/v1/optimization/guardrails/users/<user_id>
Content-Type: application/json
{
"max_cost_per_query": 1.00,
"max_bytes_scanned": "10GB",
"max_execution_seconds": 300,
"daily_budget": 50.00,
"require_time_filter": true
}
Get Optimization Recommendations
GET /api/v1/optimization/recommendations?
user_id=analyst_123&
period=last_7_days
Optimization Recommendations
The system analyzes query patterns and provides actionable recommendations:
Pre-Aggregation Recommendations
recommendations = await service.get_preagg_recommendations(
user_id="analyst_123",
period="last_30_days"
)
# Example recommendation:
# PreAggRecommendation(
# pattern="Orders by region, daily",
# query_count=145,
# total_cost=45.20,
# potential_savings=42.50,
# suggested_preagg={
# "name": "orders_daily_by_region",
# "measures": ["revenue", "count"],
# "dimensions": ["region"],
# "timeDimension": "created_at",
# "granularity": "day"
# }
# )
Filter Recommendations
Recommendation: Add Time Filter
├── Pattern: Queries without date filters on Orders cube
├── Occurrences: 23 queries
├── Impact: Scanning full table (500 GB) instead of partition
├── Suggestion: Add timeDimension filter to reduce scan by 95%
└── Example:
"timeDimensions": [{
"dimension": "Orders.created_at",
"dateRange": "Last 30 days"
}]
Index Recommendations
Recommendation: Add Index
├── Pattern: Frequent filters on customer_segment
├── Occurrences: 89 queries
├── Current Performance: ~12s average
├── With Index: ~2s estimated
└── SQL: CREATE INDEX idx_orders_segment ON orders(customer_segment)
Query Patterns Analysis
The system tracks and analyzes query patterns:
patterns = await service.analyze_query_patterns(
user_id="analyst_123",
period="last_30_days"
)
# QueryPatternAnalysis:
# ├── top_cubes: [("Orders", 234), ("Customers", 156), ("Products", 89)]
# ├── top_measures: [("revenue", 312), ("count", 287), ("avg_value", 145)]
# ├── top_dimensions: [("region", 198), ("category", 167), ("date", 342)]
# ├── common_filters: [("status='completed'", 245), ("region='US'", 123)]
# ├── avg_query_cost: 0.045
# ├── peak_usage_hours: [9, 10, 14, 15]
# └── optimization_opportunities: [...]
Budget Alerts
Configure alerts for budget thresholds:
from olytix-core.optimization.alerts.models import BudgetAlert
await service.create_budget_alert(
BudgetAlert(
user_id="analyst_123",
threshold_percent=80,
channels=["email", "slack"],
message_template="You have used {percent}% of your ${budget} monthly budget"
)
)
Alert Channels
| Channel | Configuration | Example |
|---|---|---|
| User's email address | Budget alert email | |
| Slack | Webhook URL or channel | #cost-alerts channel |
| Webhook | Custom endpoint | POST to internal system |
| In-App | Dashboard notification | Bell icon alert |
Warehouse-Specific Features
BigQuery
- Slot usage prediction
- On-demand vs flat-rate cost comparison
- Partition pruning analysis
Snowflake
- Credit estimation
- Warehouse size recommendations
- Auto-suspend suggestions
PostgreSQL
- Execution plan analysis
- Index usage statistics
- Table scan warnings
Best Practices
For Analysts
- Always preview costs before running large queries
- Use time filters to leverage partitioning
- Check for pre-aggregations that match your query
- Monitor your budget usage regularly
For Administrators
- Set reasonable defaults for guardrails
- Configure team budgets to prevent surprises
- Review recommendations monthly
- Create pre-aggregations for common patterns
Cost Reduction Strategies
- Use pre-aggregations - 90%+ savings typical
- Add partition filters - Reduce scan size
- Limit result sets - Use LIMIT when exploring
- Schedule heavy queries - Run during off-peak hours
- Share query results - Use workspaces to avoid duplicates
Configuration
# optimization config
optimization:
cost_prediction:
enabled: true
cache_ttl_seconds: 300
guardrails:
default_max_cost: 1.00
default_max_bytes: "10GB"
default_max_seconds: 300
require_time_filter: false
budgets:
default_daily_budget: 50.00
default_monthly_budget: 1000.00
alert_threshold_percent: 80
recommendations:
min_query_count: 10
min_potential_savings: 10.00
analysis_period_days: 30
Next Steps
- Query Explanation - Understand query execution plans
- Pre-Aggregations - Build and manage pre-aggregations
- Anomaly Detection - Monitor cost anomalies