Skip to main content

Query Cost Optimization

For Data Analysts

Olytix Core's query optimization system helps you understand and control query costs before execution, set guardrails to prevent runaway queries, and receive actionable recommendations to improve performance.

Overview

┌─────────────────────────────────────────────────────────────────────┐
│ QUERY OPTIMIZATION PIPELINE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ QUERY │───▶│ COST │───▶│ GUARDRAIL │ │
│ │ ANALYSIS │ │ PREDICTION │ │ CHECK │ │
│ │ │ │ │ │ │ │
│ │ Parse query │ │ Estimate │ │ Cost limits │ │
│ │ Identify │ │ scan bytes │ │ Time limits │ │
│ │ patterns │ │ join costs │ │ Row limits │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ BUDGET │◀───│ EXECUTION │◀───│ OPTIMIZATION │ │
│ │ TRACKING │ │ DECISION │ │ ADVICE │ │
│ │ │ │ │ │ │ │
│ │ Usage stats │ │ Allow/Block │ │ Pre-agg use │ │
│ │ Alerts │ │ Warnings │ │ Filter hints │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘

Key Features

Cost Prediction

Before executing a query, understand its estimated cost:

Cost ComponentDescriptionExample
Scan CostData bytes to be scanned2.5 GB
Join CostCost of join operationsMedium
Aggregation CostGrouping and computationLow
Sort CostOrdering operationsLow
Estimated TimePredicted execution time~15 seconds
Estimated CostDollar cost (for cloud warehouses)$0.012

Pre-Aggregation Savings

See potential savings from pre-aggregations:

Cost Comparison:
├── Without Pre-Aggregation:
│ ├── Scan: 50 GB
│ ├── Estimated Time: 45s
│ └── Estimated Cost: $0.25

└── With Pre-Aggregation (matched):
├── Scan: 500 MB (99% reduction)
├── Estimated Time: 2s (95% faster)
└── Estimated Cost: $0.0025 (99% savings)

Query Guardrails

Set limits to prevent expensive queries:

GuardrailDescriptionConfigurable By
Max CostMaximum dollar cost per queryAdmin
Max BytesMaximum data to scanAdmin
Max TimeQuery timeoutAdmin/User
Max RowsMaximum rows returnedUser
Require FilterMust have time/partition filterAdmin

Budget Tracking

Monitor and control query spend:

  • User budgets - Per-user spending limits
  • Team budgets - Shared team allocations
  • Project budgets - Project-level cost tracking
  • Alerts - Notifications when approaching limits

Usage

Predict Query Cost

from olytix-core.optimization.service import OptimizationService
from olytix-core.optimization.cost.predictor import CostPredictor

service = OptimizationService()

# Predict cost before execution
prediction = await service.predict_cost(
query={
"measures": ["Orders.revenue", "Orders.count"],
"dimensions": ["Orders.region", "Orders.product_category"],
"timeDimensions": [{
"dimension": "Orders.created_at",
"dateRange": "Last 90 days"
}]
}
)

# Review prediction
print(f"Scan bytes: {prediction.scan_bytes_formatted}") # "2.5 GB"
print(f"Estimated time: {prediction.estimated_seconds}s")
print(f"Estimated cost: ${prediction.estimated_cost:.4f}")
print(f"Pre-agg available: {prediction.preagg_match}")
print(f"Savings with pre-agg: {prediction.preagg_savings_percent}%")

Cost Prediction Response

CostPrediction:
├── query_hash: "abc123"
├── scan_bytes: 2684354560 # 2.5 GB
├── scan_bytes_formatted: "2.5 GB"
├── estimated_seconds: 15.5
├── estimated_cost: 0.0125

├── cost_breakdown:
│ ├── scan_cost: 0.0100
│ ├── compute_cost: 0.0020
│ └── egress_cost: 0.0005

├── preagg_match:
│ ├── matched: true
│ ├── preagg_name: "orders_daily_by_region"
│ ├── scan_bytes_with_preagg: 52428800 # 50 MB
│ ├── savings_percent: 98.0
│ └── recommendation: "Query will automatically use pre-aggregation"

├── guardrail_status:
│ ├── within_limits: true
│ ├── cost_limit: 1.00
│ ├── time_limit: 300
│ └── warnings: []

└── optimization_hints: [
"Consider adding a partition filter for better performance",
"This query pattern could benefit from a new pre-aggregation"
]

Configure Guardrails

from olytix-core.optimization.guardrails.models import GuardrailConfig

# Set user-level guardrails
await service.set_user_guardrails(
user_id="analyst_123",
config=GuardrailConfig(
max_cost_per_query=1.00, # $1 max per query
max_bytes_scanned="10GB", # 10 GB max scan
max_execution_seconds=300, # 5 minute timeout
max_rows_returned=100000, # 100k rows max
require_time_filter=True, # Must have date filter
require_partition_filter=False
)
)

# Set team-level guardrails
await service.set_team_guardrails(
team_id="analytics_team",
config=GuardrailConfig(
max_cost_per_query=5.00,
daily_budget=100.00, # $100/day team budget
monthly_budget=2000.00, # $2000/month
alert_at_percent=80 # Alert at 80% usage
)
)

Check Query Against Guardrails

# Check before execution
check_result = await service.check_guardrails(
query=query,
user_id="analyst_123"
)

if not check_result.allowed:
print(f"Query blocked: {check_result.reason}")
print(f"Suggestions: {check_result.suggestions}")
else:
if check_result.warnings:
print(f"Warnings: {check_result.warnings}")
# Proceed with execution

Budget Tracking

# Check current usage
usage = await service.get_budget_usage(
user_id="analyst_123",
period="current_month"
)

print(f"Spent: ${usage.amount_spent:.2f}")
print(f"Budget: ${usage.budget_limit:.2f}")
print(f"Remaining: ${usage.remaining:.2f}")
print(f"Usage: {usage.percent_used}%")
print(f"Queries run: {usage.query_count}")

# Get usage breakdown
breakdown = await service.get_usage_breakdown(
user_id="analyst_123",
period="current_month",
group_by="cube"
)

# Returns spending per cube
for cube_usage in breakdown:
print(f"{cube_usage.cube_name}: ${cube_usage.amount:.2f}")

API Endpoints

Predict Cost

POST /api/v1/optimization/predict-cost
Content-Type: application/json

{
"query": {
"measures": ["Orders.revenue"],
"dimensions": ["Orders.region"],
"timeDimensions": [{
"dimension": "Orders.created_at",
"dateRange": "Last 30 days"
}]
},
"include_preagg_analysis": true
}

Check Guardrails

POST /api/v1/optimization/check-guardrails
Content-Type: application/json

{
"query": {...},
"user_id": "analyst_123"
}

Get Budget Usage

GET /api/v1/optimization/budget/usage?
user_id=analyst_123&
period=current_month

Set Guardrails

PUT /api/v1/optimization/guardrails/users/<user_id>
Content-Type: application/json

{
"max_cost_per_query": 1.00,
"max_bytes_scanned": "10GB",
"max_execution_seconds": 300,
"daily_budget": 50.00,
"require_time_filter": true
}

Get Optimization Recommendations

GET /api/v1/optimization/recommendations?
user_id=analyst_123&
period=last_7_days

Optimization Recommendations

The system analyzes query patterns and provides actionable recommendations:

Pre-Aggregation Recommendations

recommendations = await service.get_preagg_recommendations(
user_id="analyst_123",
period="last_30_days"
)

# Example recommendation:
# PreAggRecommendation(
# pattern="Orders by region, daily",
# query_count=145,
# total_cost=45.20,
# potential_savings=42.50,
# suggested_preagg={
# "name": "orders_daily_by_region",
# "measures": ["revenue", "count"],
# "dimensions": ["region"],
# "timeDimension": "created_at",
# "granularity": "day"
# }
# )

Filter Recommendations

Recommendation: Add Time Filter
├── Pattern: Queries without date filters on Orders cube
├── Occurrences: 23 queries
├── Impact: Scanning full table (500 GB) instead of partition
├── Suggestion: Add timeDimension filter to reduce scan by 95%
└── Example:
"timeDimensions": [{
"dimension": "Orders.created_at",
"dateRange": "Last 30 days"
}]

Index Recommendations

Recommendation: Add Index
├── Pattern: Frequent filters on customer_segment
├── Occurrences: 89 queries
├── Current Performance: ~12s average
├── With Index: ~2s estimated
└── SQL: CREATE INDEX idx_orders_segment ON orders(customer_segment)

Query Patterns Analysis

The system tracks and analyzes query patterns:

patterns = await service.analyze_query_patterns(
user_id="analyst_123",
period="last_30_days"
)

# QueryPatternAnalysis:
# ├── top_cubes: [("Orders", 234), ("Customers", 156), ("Products", 89)]
# ├── top_measures: [("revenue", 312), ("count", 287), ("avg_value", 145)]
# ├── top_dimensions: [("region", 198), ("category", 167), ("date", 342)]
# ├── common_filters: [("status='completed'", 245), ("region='US'", 123)]
# ├── avg_query_cost: 0.045
# ├── peak_usage_hours: [9, 10, 14, 15]
# └── optimization_opportunities: [...]

Budget Alerts

Configure alerts for budget thresholds:

from olytix-core.optimization.alerts.models import BudgetAlert

await service.create_budget_alert(
BudgetAlert(
user_id="analyst_123",
threshold_percent=80,
channels=["email", "slack"],
message_template="You have used {percent}% of your ${budget} monthly budget"
)
)

Alert Channels

ChannelConfigurationExample
EmailUser's email addressBudget alert email
SlackWebhook URL or channel#cost-alerts channel
WebhookCustom endpointPOST to internal system
In-AppDashboard notificationBell icon alert

Warehouse-Specific Features

BigQuery

  • Slot usage prediction
  • On-demand vs flat-rate cost comparison
  • Partition pruning analysis

Snowflake

  • Credit estimation
  • Warehouse size recommendations
  • Auto-suspend suggestions

PostgreSQL

  • Execution plan analysis
  • Index usage statistics
  • Table scan warnings

Best Practices

For Analysts

  1. Always preview costs before running large queries
  2. Use time filters to leverage partitioning
  3. Check for pre-aggregations that match your query
  4. Monitor your budget usage regularly

For Administrators

  1. Set reasonable defaults for guardrails
  2. Configure team budgets to prevent surprises
  3. Review recommendations monthly
  4. Create pre-aggregations for common patterns

Cost Reduction Strategies

  1. Use pre-aggregations - 90%+ savings typical
  2. Add partition filters - Reduce scan size
  3. Limit result sets - Use LIMIT when exploring
  4. Schedule heavy queries - Run during off-peak hours
  5. Share query results - Use workspaces to avoid duplicates

Configuration

# optimization config
optimization:
cost_prediction:
enabled: true
cache_ttl_seconds: 300

guardrails:
default_max_cost: 1.00
default_max_bytes: "10GB"
default_max_seconds: 300
require_time_filter: false

budgets:
default_daily_budget: 50.00
default_monthly_budget: 1000.00
alert_threshold_percent: 80

recommendations:
min_query_count: 10
min_potential_savings: 10.00
analysis_period_days: 30

Next Steps