Query Explanation

For Data Analysts

Olytix Core's query explanation feature helps you understand exactly how your semantic queries are executed, from the logical plan through to the warehouse-specific SQL and execution details.

Overview

┌─────────────────────────────────────────────────────────────────────┐
│                    QUERY EXPLANATION PIPELINE                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐         │
│   │   SEMANTIC   │───▶│   LOGICAL    │───▶│   PHYSICAL   │         │
│   │    QUERY     │    │    PLAN      │    │    PLAN      │         │
│   │              │    │              │    │              │         │
│   │ Measures     │    │ Operations   │    │ SQL          │         │
│   │ Dimensions   │    │ Joins        │    │ Indexes      │         │
│   │ Filters      │    │ Aggregations │    │ Scans        │         │
│   └──────────────┘    └──────────────┘    └──────────────┘         │
│                                                  │                   │
│                                                  ▼                   │
│   ┌──────────────┐    ┌──────────────┐    ┌──────────────┐         │
│   │   VISUAL     │◀───│ OPTIMIZATION │◀───│   COST       │         │
│   │   OUTPUT     │    │    HINTS     │    │  BREAKDOWN   │         │
│   │              │    │              │    │              │         │
│   │ ASCII        │    │ Pre-agg use  │    │ Scan cost    │         │
│   │ Mermaid      │    │ Missing idx  │    │ Join cost    │         │
│   │ DOT          │    │ Filter hints │    │ Sort cost    │         │
│   └──────────────┘    └──────────────┘    └──────────────┘         │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Key Features

Logical Plan

View the abstract execution plan before warehouse translation:

Operations - Scan, Filter, Project, Join, Aggregate, Sort
Data Flow - How data moves through operations
Cardinality Estimates - Expected row counts at each step
Cost Estimates - Relative cost of each operation

Physical Plan

See the warehouse-specific execution details:

Generated SQL - The actual SQL sent to the warehouse
Index Usage - Which indexes are used (or not)
Scan Types - Full table vs index vs partition scans
Join Methods - Hash, merge, nested loop joins

Pre-Aggregation Analysis

Understand if and why pre-aggregations are used:

Match Status - Whether a pre-agg was matched
Match Reason - Why it was or wasn't used
Alternative Pre-Aggs - Other pre-aggs that could work
Savings Estimate - Cost difference with/without pre-agg

Optimization Hints

Receive actionable suggestions:

Add missing filters
Create recommended indexes
Use different pre-aggregations
Restructure joins for efficiency

Usage

Explain a Query

from olytix-core.explain.service import ExplainService
from olytix-core.explain.models import ExplainFormat

service = ExplainService()

# Get full explanation
explanation = await service.explain_query(
    query={
        "measures": ["Orders.revenue", "Orders.count"],
        "dimensions": ["Orders.region", "Orders.product_category"],
        "filters": [{
            "member": "Orders.status",
            "operator": "equals",
            "values": ["completed"]
        }],
        "timeDimensions": [{
            "dimension": "Orders.created_at",
            "granularity": "month",
            "dateRange": "Last 6 months"
        }]
    },
    format=ExplainFormat.DETAILED
)

Explanation Response

QueryExplanation:
├── query_hash: "xyz789"
├── timestamp: "2024-01-15T10:30:00Z"
│
├── logical_plan:
│   ├── operation: "Aggregate"
│   ├── grouping: ["region", "product_category", "month"]
│   ├── aggregations: ["SUM(revenue)", "COUNT(*)"]
│   ├── estimated_rows: 450
│   └── child:
│       ├── operation: "Filter"
│       ├── condition: "status = 'completed' AND created_at >= '2023-07-01'"
│       ├── estimated_rows: 125000
│       └── child:
│           ├── operation: "Scan"
│           ├── table: "orders"
│           ├── columns: ["revenue", "region", "product_category", "created_at", "status"]
│           └── estimated_rows: 500000
│
├── physical_plan:
│   ├── sql: "SELECT ... FROM orders WHERE ... GROUP BY ..."
│   ├── warehouse: "postgresql"
│   ├── execution_strategy: "HashAggregate"
│   └── index_usage: [
│       {"index": "idx_orders_created_at", "used": true, "type": "range_scan"},
│       {"index": "idx_orders_status", "used": true, "type": "equality"}
│   ]
│
├── preagg_analysis:
│   ├── matched: true
│   ├── preagg_name: "orders_monthly_by_region_category"
│   ├── match_type: "exact"
│   ├── original_scan_bytes: 5368709120  # 5 GB
│   ├── preagg_scan_bytes: 10485760      # 10 MB
│   └── savings_percent: 99.8
│
├── cost_breakdown:
│   ├── total_cost: 0.012
│   ├── scan_cost: 0.008
│   ├── compute_cost: 0.003
│   └── network_cost: 0.001
│
└── optimization_hints: [
    {
        "type": "info",
        "message": "Query uses pre-aggregation 'orders_monthly_by_region_category'",
        "impact": "99.8% scan reduction"
    }
]

Visual Plan Formats

ASCII Format

plan_ascii = await service.explain_query(
    query=query,
    format=ExplainFormat.ASCII
)

# Output:
# ┌─────────────────────────────────────┐
# │ Aggregate                           │
# │ GROUP BY: region, category, month   │
# │ Rows: ~450                          │
# └─────────────────┬───────────────────┘
#                   │
#         ┌─────────┴─────────┐
#         │ Filter            │
#         │ status='completed'│
#         │ date >= 2023-07   │
#         │ Rows: ~125,000    │
#         └─────────┬─────────┘
#                   │
#         ┌─────────┴─────────┐
#         │ Scan: orders      │
#         │ Rows: ~500,000    │
#         │ [Using pre-agg]   │
#         └───────────────────┘

Mermaid Format

plan_mermaid = await service.explain_query(
    query=query,
    format=ExplainFormat.MERMAID
)

# Output:
# graph TD
#     A[Aggregate<br/>GROUP BY: region, category, month<br/>~450 rows]
#     B[Filter<br/>status='completed', date >= 2023-07<br/>~125,000 rows]
#     C[Scan: orders<br/>~500,000 rows<br/>Using pre-agg]
#     A --> B
#     B --> C

DOT Format (Graphviz)

plan_dot = await service.explain_query(
    query=query,
    format=ExplainFormat.DOT
)

# Output: Graphviz DOT notation for visualization tools

Compare Query Plans

# Compare two query variations
comparison = await service.compare_plans(
    query_a={
        "measures": ["Orders.revenue"],
        "dimensions": ["Orders.region"],
        "timeDimensions": [{
            "dimension": "Orders.created_at",
            "dateRange": "Last year"
        }]
    },
    query_b={
        "measures": ["Orders.revenue"],
        "dimensions": ["Orders.region"],
        "timeDimensions": [{
            "dimension": "Orders.created_at",
            "dateRange": "Last 30 days"
        }]
    }
)

# PlanComparison:
# ├── query_a_cost: 0.15
# ├── query_b_cost: 0.02
# ├── cost_difference: -87%
# ├── query_a_scan_bytes: "15 GB"
# ├── query_b_scan_bytes: "1.2 GB"
# ├── differences: [
# │   "Query B scans 92% less data due to narrower date range",
# │   "Both queries use the same pre-aggregation"
# │ ]
# └── recommendation: "Use shorter date ranges when possible"

Explain SQL Directly

# Explain raw SQL
sql_explanation = await service.explain_sql(
    sql="""
        SELECT region, SUM(revenue) as total_revenue
        FROM orders
        WHERE status = 'completed'
          AND created_at >= '2024-01-01'
        GROUP BY region
        ORDER BY total_revenue DESC
    """,
    warehouse="postgresql"
)

API Endpoints

Explain Query

POST /api/v1/explain/query
Content-Type: application/json

{
  "query": {
    "measures": ["Orders.revenue"],
    "dimensions": ["Orders.region"],
    "timeDimensions": [{
      "dimension": "Orders.created_at",
      "dateRange": "Last 30 days"
    }]
  },
  "format": "detailed",
  "include_sql": true,
  "include_cost": true
}

Get Plan Visualization

GET /api/v1/explain/query/<query_hash>/visualization?
  format=mermaid

Compare Plans

POST /api/v1/explain/compare
Content-Type: application/json

{
  "query_a": {...},
  "query_b": {...}
}

Explain SQL

POST /api/v1/explain/sql
Content-Type: application/json

{
  "sql": "SELECT ...",
  "warehouse": "postgresql"
}

Understanding the Logical Plan

Operation Types

Operation	Description	Performance Impact
Scan	Read data from table/pre-agg	Base cost, affected by filters
Filter	Apply WHERE conditions	Reduces rows, low cost
Project	Select columns	Very low cost
Join	Combine tables	Can be expensive
Aggregate	GROUP BY and aggregations	Moderate cost
Sort	ORDER BY	Can be expensive for large sets
Limit	LIMIT/OFFSET	Very low cost

Reading Cardinality Estimates

Scan: orders (~500,000 rows)
    └── Filter: status='completed' (~350,000 rows)  [30% filtered]
        └── Filter: date >= 2024-01 (~125,000 rows)  [64% filtered]
            └── Aggregate: GROUP BY region (~15 rows)

Understanding Physical Plans

Index Usage

index_usage = explanation.physical_plan.index_usage

# Example output:
# [
#   {"index": "idx_orders_created_at", "used": True, "type": "range_scan"},
#   {"index": "idx_orders_status", "used": True, "type": "equality"},
#   {"index": "idx_orders_customer", "used": False, "reason": "not in query"}
# ]

Join Methods

Method	Best For	Characteristics
Hash Join	Large tables, equality joins	Memory-intensive, fast
Merge Join	Sorted data, equality joins	I/O efficient
Nested Loop	Small tables, inequality joins	Simple, can be slow
Index Join	Indexed lookup tables	Very fast for small lookups

Scan Types

Scan Type	Description	Performance
Sequential	Full table scan	Slowest, reads all rows
Index Scan	Use index to find rows	Fast for selective queries
Index Only	Data from index alone	Fastest, no table access
Bitmap Scan	Combine multiple indexes	Good for OR conditions

Pre-Aggregation Analysis

Match Types

# Exact match - pre-agg has exactly the measures/dimensions needed
preagg_analysis.match_type = "exact"

# Superset match - pre-agg has more than needed, will filter
preagg_analysis.match_type = "superset"

# Rollup match - pre-agg can be rolled up (e.g., daily to monthly)
preagg_analysis.match_type = "rollup"

# No match - query must hit base table
preagg_analysis.match_type = "none"

Why Pre-Agg Not Used

# Example: Pre-agg not matched
preagg_analysis = {
    "matched": False,
    "candidates_checked": [
        {
            "name": "orders_daily_by_region",
            "rejected_reason": "missing dimension: product_category"
        },
        {
            "name": "orders_monthly_summary",
            "rejected_reason": "granularity too coarse (monthly vs daily)"
        }
    ],
    "suggestion": "Create pre-aggregation with dimensions: region, product_category"
}

Optimization Hints

Hint Types

Type	Severity	Description
info	Low	Informational, no action needed
suggestion	Medium	Could improve performance
warning	High	Significant performance issue
critical	Urgent	Query may timeout or fail

Example Hints

optimization_hints = [
    {
        "type": "warning",
        "category": "missing_filter",
        "message": "Query scans entire orders table without partition filter",
        "suggestion": "Add timeDimension filter to leverage partitioning",
        "estimated_improvement": "90% scan reduction"
    },
    {
        "type": "suggestion",
        "category": "preagg_opportunity",
        "message": "This query pattern runs frequently without pre-aggregation",
        "suggestion": "Create pre-aggregation: orders_by_region_category_monthly",
        "estimated_improvement": "95% cost reduction"
    },
    {
        "type": "info",
        "category": "preagg_used",
        "message": "Query uses pre-aggregation 'orders_daily_summary'",
        "impact": "98% scan reduction achieved"
    }
]

Best Practices

When to Use Explain

Debugging slow queries - Understand why a query is slow
Before production - Verify query efficiency
After schema changes - Check impact on execution
Learning - Understand how semantic queries translate to SQL

Reading Plans Effectively

Start from the bottom - Data flows upward
Look at row estimates - Big drops indicate effective filters
Check index usage - Sequential scans on large tables are red flags
Review pre-agg match - Not using available pre-aggs wastes resources

Acting on Hints

Address warnings first - Biggest impact
Review suggestions monthly - Prioritize by frequency
Create pre-aggregations - For repeated patterns
Add indexes - For common filter columns

Configuration

# explain configuration
explain:
  enabled: true

  cache:
    enabled: true
    ttl_seconds: 300

  formats:
    ascii:
      max_depth: 10
      show_costs: true
    mermaid:
      direction: "TD"  # TB, BT, LR, RL
    dot:
      rankdir: "TB"

  hints:
    enabled: true
    categories:
      - missing_filter
      - preagg_opportunity
      - index_suggestion
      - join_optimization

Next Steps

Query Optimization - Cost prediction and guardrails
Pre-Aggregations - Build efficient pre-aggregations
Semantic Queries - Query fundamentals

Overview​

Key Features​

Logical Plan​

Physical Plan​

Pre-Aggregation Analysis​

Optimization Hints​

Usage​

Explain a Query​

Explanation Response​

Visual Plan Formats​

ASCII Format​

Mermaid Format​

DOT Format (Graphviz)​

Compare Query Plans​

Explain SQL Directly​

API Endpoints​

Explain Query​

Get Plan Visualization​

Compare Plans​

Explain SQL​

Understanding the Logical Plan​

Operation Types​

Reading Cardinality Estimates​

Understanding Physical Plans​

Index Usage​

Join Methods​

Scan Types​

Pre-Aggregation Analysis​

Match Types​

Why Pre-Agg Not Used​

Optimization Hints​

Hint Types​

Example Hints​

Best Practices​

When to Use Explain​

Reading Plans Effectively​

Acting on Hints​

Configuration​

Next Steps​

Overview

Key Features

Logical Plan

Physical Plan

Pre-Aggregation Analysis

Optimization Hints

Usage

Explain a Query

Explanation Response

Visual Plan Formats

ASCII Format

Mermaid Format

DOT Format (Graphviz)

Compare Query Plans

Explain SQL Directly

API Endpoints

Explain Query

Get Plan Visualization

Compare Plans

Explain SQL

Understanding the Logical Plan

Operation Types

Reading Cardinality Estimates

Understanding Physical Plans

Index Usage

Join Methods

Scan Types

Pre-Aggregation Analysis

Match Types

Why Pre-Agg Not Used

Optimization Hints

Hint Types

Example Hints

Best Practices

When to Use Explain

Reading Plans Effectively

Acting on Hints

Configuration

Next Steps