Skip to main content

Query Explanation

For Data Analysts

Olytix Core's query explanation feature helps you understand exactly how your semantic queries are executed, from the logical plan through to the warehouse-specific SQL and execution details.

Overview

┌─────────────────────────────────────────────────────────────────────┐
│ QUERY EXPLANATION PIPELINE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ SEMANTIC │───▶│ LOGICAL │───▶│ PHYSICAL │ │
│ │ QUERY │ │ PLAN │ │ PLAN │ │
│ │ │ │ │ │ │ │
│ │ Measures │ │ Operations │ │ SQL │ │
│ │ Dimensions │ │ Joins │ │ Indexes │ │
│ │ Filters │ │ Aggregations │ │ Scans │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ VISUAL │◀───│ OPTIMIZATION │◀───│ COST │ │
│ │ OUTPUT │ │ HINTS │ │ BREAKDOWN │ │
│ │ │ │ │ │ │ │
│ │ ASCII │ │ Pre-agg use │ │ Scan cost │ │
│ │ Mermaid │ │ Missing idx │ │ Join cost │ │
│ │ DOT │ │ Filter hints │ │ Sort cost │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘

Key Features

Logical Plan

View the abstract execution plan before warehouse translation:

  • Operations - Scan, Filter, Project, Join, Aggregate, Sort
  • Data Flow - How data moves through operations
  • Cardinality Estimates - Expected row counts at each step
  • Cost Estimates - Relative cost of each operation

Physical Plan

See the warehouse-specific execution details:

  • Generated SQL - The actual SQL sent to the warehouse
  • Index Usage - Which indexes are used (or not)
  • Scan Types - Full table vs index vs partition scans
  • Join Methods - Hash, merge, nested loop joins

Pre-Aggregation Analysis

Understand if and why pre-aggregations are used:

  • Match Status - Whether a pre-agg was matched
  • Match Reason - Why it was or wasn't used
  • Alternative Pre-Aggs - Other pre-aggs that could work
  • Savings Estimate - Cost difference with/without pre-agg

Optimization Hints

Receive actionable suggestions:

  • Add missing filters
  • Create recommended indexes
  • Use different pre-aggregations
  • Restructure joins for efficiency

Usage

Explain a Query

from olytix-core.explain.service import ExplainService
from olytix-core.explain.models import ExplainFormat

service = ExplainService()

# Get full explanation
explanation = await service.explain_query(
query={
"measures": ["Orders.revenue", "Orders.count"],
"dimensions": ["Orders.region", "Orders.product_category"],
"filters": [{
"member": "Orders.status",
"operator": "equals",
"values": ["completed"]
}],
"timeDimensions": [{
"dimension": "Orders.created_at",
"granularity": "month",
"dateRange": "Last 6 months"
}]
},
format=ExplainFormat.DETAILED
)

Explanation Response

QueryExplanation:
├── query_hash: "xyz789"
├── timestamp: "2024-01-15T10:30:00Z"

├── logical_plan:
│ ├── operation: "Aggregate"
│ ├── grouping: ["region", "product_category", "month"]
│ ├── aggregations: ["SUM(revenue)", "COUNT(*)"]
│ ├── estimated_rows: 450
│ └── child:
│ ├── operation: "Filter"
│ ├── condition: "status = 'completed' AND created_at >= '2023-07-01'"
│ ├── estimated_rows: 125000
│ └── child:
│ ├── operation: "Scan"
│ ├── table: "orders"
│ ├── columns: ["revenue", "region", "product_category", "created_at", "status"]
│ └── estimated_rows: 500000

├── physical_plan:
│ ├── sql: "SELECT ... FROM orders WHERE ... GROUP BY ..."
│ ├── warehouse: "postgresql"
│ ├── execution_strategy: "HashAggregate"
│ └── index_usage: [
{"index": "idx_orders_created_at", "used": true, "type": "range_scan"},
{"index": "idx_orders_status", "used": true, "type": "equality"}
]

├── preagg_analysis:
│ ├── matched: true
│ ├── preagg_name: "orders_monthly_by_region_category"
│ ├── match_type: "exact"
│ ├── original_scan_bytes: 5368709120 # 5 GB
│ ├── preagg_scan_bytes: 10485760 # 10 MB
│ └── savings_percent: 99.8

├── cost_breakdown:
│ ├── total_cost: 0.012
│ ├── scan_cost: 0.008
│ ├── compute_cost: 0.003
│ └── network_cost: 0.001

└── optimization_hints: [
{
"type": "info",
"message": "Query uses pre-aggregation 'orders_monthly_by_region_category'",
"impact": "99.8% scan reduction"
}
]

Visual Plan Formats

ASCII Format

plan_ascii = await service.explain_query(
query=query,
format=ExplainFormat.ASCII
)

# Output:
# ┌─────────────────────────────────────┐
# │ Aggregate │
# │ GROUP BY: region, category, month │
# │ Rows: ~450 │
# └─────────────────┬───────────────────┘
# │
# ┌─────────┴─────────┐
# │ Filter │
# │ status='completed'│
# │ date >= 2023-07 │
# │ Rows: ~125,000 │
# └─────────┬─────────┘
# │
# ┌─────────┴─────────┐
# │ Scan: orders │
# │ Rows: ~500,000 │
# │ [Using pre-agg] │
# └───────────────────┘

Mermaid Format

plan_mermaid = await service.explain_query(
query=query,
format=ExplainFormat.MERMAID
)

# Output:
# graph TD
# A[Aggregate<br/>GROUP BY: region, category, month<br/>~450 rows]
# B[Filter<br/>status='completed', date >= 2023-07<br/>~125,000 rows]
# C[Scan: orders<br/>~500,000 rows<br/>Using pre-agg]
# A --> B
# B --> C

DOT Format (Graphviz)

plan_dot = await service.explain_query(
query=query,
format=ExplainFormat.DOT
)

# Output: Graphviz DOT notation for visualization tools

Compare Query Plans

# Compare two query variations
comparison = await service.compare_plans(
query_a={
"measures": ["Orders.revenue"],
"dimensions": ["Orders.region"],
"timeDimensions": [{
"dimension": "Orders.created_at",
"dateRange": "Last year"
}]
},
query_b={
"measures": ["Orders.revenue"],
"dimensions": ["Orders.region"],
"timeDimensions": [{
"dimension": "Orders.created_at",
"dateRange": "Last 30 days"
}]
}
)

# PlanComparison:
# ├── query_a_cost: 0.15
# ├── query_b_cost: 0.02
# ├── cost_difference: -87%
# ├── query_a_scan_bytes: "15 GB"
# ├── query_b_scan_bytes: "1.2 GB"
# ├── differences: [
# │ "Query B scans 92% less data due to narrower date range",
# │ "Both queries use the same pre-aggregation"
# │ ]
# └── recommendation: "Use shorter date ranges when possible"

Explain SQL Directly

# Explain raw SQL
sql_explanation = await service.explain_sql(
sql="""
SELECT region, SUM(revenue) as total_revenue
FROM orders
WHERE status = 'completed'
AND created_at >= '2024-01-01'
GROUP BY region
ORDER BY total_revenue DESC
""",
warehouse="postgresql"
)

API Endpoints

Explain Query

POST /api/v1/explain/query
Content-Type: application/json

{
"query": {
"measures": ["Orders.revenue"],
"dimensions": ["Orders.region"],
"timeDimensions": [{
"dimension": "Orders.created_at",
"dateRange": "Last 30 days"
}]
},
"format": "detailed",
"include_sql": true,
"include_cost": true
}

Get Plan Visualization

GET /api/v1/explain/query/<query_hash>/visualization?
format=mermaid

Compare Plans

POST /api/v1/explain/compare
Content-Type: application/json

{
"query_a": {...},
"query_b": {...}
}

Explain SQL

POST /api/v1/explain/sql
Content-Type: application/json

{
"sql": "SELECT ...",
"warehouse": "postgresql"
}

Understanding the Logical Plan

Operation Types

OperationDescriptionPerformance Impact
ScanRead data from table/pre-aggBase cost, affected by filters
FilterApply WHERE conditionsReduces rows, low cost
ProjectSelect columnsVery low cost
JoinCombine tablesCan be expensive
AggregateGROUP BY and aggregationsModerate cost
SortORDER BYCan be expensive for large sets
LimitLIMIT/OFFSETVery low cost

Reading Cardinality Estimates

Scan: orders (~500,000 rows)
└── Filter: status='completed' (~350,000 rows) [30% filtered]
└── Filter: date >= 2024-01 (~125,000 rows) [64% filtered]
└── Aggregate: GROUP BY region (~15 rows)

Understanding Physical Plans

Index Usage

index_usage = explanation.physical_plan.index_usage

# Example output:
# [
# {"index": "idx_orders_created_at", "used": True, "type": "range_scan"},
# {"index": "idx_orders_status", "used": True, "type": "equality"},
# {"index": "idx_orders_customer", "used": False, "reason": "not in query"}
# ]

Join Methods

MethodBest ForCharacteristics
Hash JoinLarge tables, equality joinsMemory-intensive, fast
Merge JoinSorted data, equality joinsI/O efficient
Nested LoopSmall tables, inequality joinsSimple, can be slow
Index JoinIndexed lookup tablesVery fast for small lookups

Scan Types

Scan TypeDescriptionPerformance
SequentialFull table scanSlowest, reads all rows
Index ScanUse index to find rowsFast for selective queries
Index OnlyData from index aloneFastest, no table access
Bitmap ScanCombine multiple indexesGood for OR conditions

Pre-Aggregation Analysis

Match Types

# Exact match - pre-agg has exactly the measures/dimensions needed
preagg_analysis.match_type = "exact"

# Superset match - pre-agg has more than needed, will filter
preagg_analysis.match_type = "superset"

# Rollup match - pre-agg can be rolled up (e.g., daily to monthly)
preagg_analysis.match_type = "rollup"

# No match - query must hit base table
preagg_analysis.match_type = "none"

Why Pre-Agg Not Used

# Example: Pre-agg not matched
preagg_analysis = {
"matched": False,
"candidates_checked": [
{
"name": "orders_daily_by_region",
"rejected_reason": "missing dimension: product_category"
},
{
"name": "orders_monthly_summary",
"rejected_reason": "granularity too coarse (monthly vs daily)"
}
],
"suggestion": "Create pre-aggregation with dimensions: region, product_category"
}

Optimization Hints

Hint Types

TypeSeverityDescription
infoLowInformational, no action needed
suggestionMediumCould improve performance
warningHighSignificant performance issue
criticalUrgentQuery may timeout or fail

Example Hints

optimization_hints = [
{
"type": "warning",
"category": "missing_filter",
"message": "Query scans entire orders table without partition filter",
"suggestion": "Add timeDimension filter to leverage partitioning",
"estimated_improvement": "90% scan reduction"
},
{
"type": "suggestion",
"category": "preagg_opportunity",
"message": "This query pattern runs frequently without pre-aggregation",
"suggestion": "Create pre-aggregation: orders_by_region_category_monthly",
"estimated_improvement": "95% cost reduction"
},
{
"type": "info",
"category": "preagg_used",
"message": "Query uses pre-aggregation 'orders_daily_summary'",
"impact": "98% scan reduction achieved"
}
]

Best Practices

When to Use Explain

  1. Debugging slow queries - Understand why a query is slow
  2. Before production - Verify query efficiency
  3. After schema changes - Check impact on execution
  4. Learning - Understand how semantic queries translate to SQL

Reading Plans Effectively

  1. Start from the bottom - Data flows upward
  2. Look at row estimates - Big drops indicate effective filters
  3. Check index usage - Sequential scans on large tables are red flags
  4. Review pre-agg match - Not using available pre-aggs wastes resources

Acting on Hints

  1. Address warnings first - Biggest impact
  2. Review suggestions monthly - Prioritize by frequency
  3. Create pre-aggregations - For repeated patterns
  4. Add indexes - For common filter columns

Configuration

# explain configuration
explain:
enabled: true

cache:
enabled: true
ttl_seconds: 300

formats:
ascii:
max_depth: 10
show_costs: true
mermaid:
direction: "TD" # TB, BT, LR, RL
dot:
rankdir: "TB"

hints:
enabled: true
categories:
- missing_filter
- preagg_opportunity
- index_suggestion
- join_optimization

Next Steps