Intelligent Query Assistant

For Data Analysts

The Query Assistant provides a natural language interface for data exploration, allowing users to ask questions in plain English and receive optimized semantic queries.

Overview

┌─────────────────────────────────────────────────────────────────────┐
│                     QUERY ASSISTANT ARCHITECTURE                     │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   "Show me total revenue by region for last month"                  │
│                          │                                          │
│                          ▼                                          │
│   ┌─────────────────────────────────────────────────────────┐      │
│   │                    INTENT DETECTOR                       │      │
│   │   Detects: AGGREGATE intent with time context            │      │
│   └─────────────────────────────────────────────────────────┘      │
│                          │                                          │
│                          ▼                                          │
│   ┌─────────────────────────────────────────────────────────┐      │
│   │                   ENTITY EXTRACTOR                       │      │
│   │   Extracts: revenue (measure), region (dimension),       │      │
│   │             last_month (time period)                     │      │
│   └─────────────────────────────────────────────────────────┘      │
│                          │                                          │
│                          ▼                                          │
│   ┌─────────────────────────────────────────────────────────┐      │
│   │                   ENTITY RESOLVER                        │      │
│   │   Resolves: Orders.revenue, Orders.region,               │      │
│   │             Orders.created_at with date_range            │      │
│   └─────────────────────────────────────────────────────────┘      │
│                          │                                          │
│                          ▼                                          │
│   ┌─────────────────────────────────────────────────────────┐      │
│   │                    QUERY BUILDER                         │      │
│   │   Generates semantic query with measures, dimensions,    │      │
│   │   filters, and time dimensions                           │      │
│   └─────────────────────────────────────────────────────────┘      │
│                          │                                          │
│                          ▼                                          │
│                   Generated Query + Explanation                     │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Key Features

Intent Detection

The assistant automatically detects what type of analysis you want:

Intent	Description	Example Questions
AGGREGATE	Sum, count, average metrics	"What is the total revenue?"
COMPARE	Compare periods or segments	"Compare sales this month vs last month"
TREND	Analyze changes over time	"Show revenue trend over the past year"
RANK	Top/bottom analysis	"Top 10 products by sales"
FILTER	Filter data by conditions	"Orders where status is completed"
EXPLORE	Data discovery	"What metrics are available?"

Entity Extraction

Automatically identifies semantic entities from natural language:

# Example extraction from: "Show total revenue by region for electronics last month"

Entities Extracted:
├── Measures: ["revenue"]
├── Dimensions: ["region"]
├── Filter Values: ["electronics"]
├── Time Period: "last_month"
├── Aggregation: "sum" (from "total")
└── Numbers: []

Smart Autocomplete

Get intelligent suggestions as you type:

Input: "Show rev"
Suggestions:
├── "Show revenue" (measure match)
├── "Show revenue by region" (common pattern)
├── "Show revenue trend" (template match)
└── "Show revenue for last month" (time pattern)

Query Templates

Pre-built templates for common analysis patterns:

Templates Available:
  - "Total {measure} by {dimension}"
  - "Top {n} {dimension} by {measure}"
  - "{measure} trend over {time_period}"
  - "Compare {measure} {period1} vs {period2}"
  - "{measure} breakdown by {dimension}"

Usage

Basic Query Processing

from olytix-core.assistant.engine import QueryAssistant

# Initialize with semantic index
assistant = QueryAssistant(
    semantic_index=semantic_layer.get_index(),
    default_time_dimension="Orders.created_at"
)

# Process a natural language query
response = assistant.process_query("Show me total revenue by region")

# Response includes:
# - Generated query (measures, dimensions, filters)
# - Natural language explanation
# - Visualization recommendation
# - Confidence score

Response Structure

AssistantResponse:
├── response_type: "query_generated"
├── query:
│   ├── measures: ["Orders.revenue"]
│   ├── dimensions: ["Orders.region"]
│   ├── filters: []
│   └── time_dimensions: []
├── narration:
│   ├── summary: "Showing total revenue broken down by region"
│   └── details: ["Aggregating revenue using SUM", "Grouped by region"]
├── visualization:
│   ├── type: "bar_chart"
│   └── config: { x: "region", y: "revenue" }
├── confidence: 0.92
└── warnings: []

Autocomplete

# Get autocomplete suggestions
suggestions = assistant.get_autocomplete("Show revenue by")

# Returns ranked suggestions:
# [
#   AutocompleteSuggestion(text="Show revenue by region", score=0.95),
#   AutocompleteSuggestion(text="Show revenue by product", score=0.88),
#   AutocompleteSuggestion(text="Show revenue by month", score=0.85),
# ]

# Start a conversation
response1 = assistant.process_query(
    "Show revenue",
    user_id="analyst_1"
)
# Query: { measures: ["Orders.revenue"] }

# Continue the conversation
response2 = assistant.process_query(
    "Break it down by region",
    user_id="analyst_1",
    conversation_id=response1.metadata["conversation_id"]
)
# Query: { measures: ["Orders.revenue"], dimensions: ["Orders.region"] }

# Further refinement
response3 = assistant.process_query(
    "Just for last month",
    user_id="analyst_1",
    conversation_id=response1.metadata["conversation_id"]
)
# Query includes time filter for last month

API Endpoints

Process Query

POST /api/v1/assistant/query
Content-Type: application/json

{
  "query": "Show total revenue by region for last month",
  "user_id": "analyst_1",
  "conversation_id": null,
  "options": {
    "include_explanation": true,
    "include_visualization": true
  }
}

Get Autocomplete

GET /api/v1/assistant/autocomplete?input=Show%20rev&limit=5

Get Recommendations

GET /api/v1/assistant/recommendations?user_id=analyst_1

Configuration

Semantic Index

The assistant requires a semantic index mapping natural language terms to semantic model elements:

semantic_index = {
    "measures": {
        "Orders.revenue": {
            "label": "Revenue",
            "type": "sum",
            "cube": "Orders",
            "name": "revenue",
            "description": "Total order revenue",
            "synonyms": ["sales", "income", "earnings"]
        }
    },
    "dimensions": {
        "Orders.region": {
            "label": "Region",
            "type": "string",
            "cube": "Orders",
            "name": "region",
            "synonyms": ["area", "territory", "location"]
        }
    }
}

Time Dimension

Configure the default time dimension for temporal queries:

assistant = QueryAssistant(
    semantic_index=index,
    default_time_dimension="Orders.created_at"
)

Query Explanation

Every generated query includes a human-readable explanation:

narration = assistant.explain_query(query)

# Output:
# NarratedQuery(
#     summary="Showing total revenue grouped by region for the last month",
#     details=[
#         "Calculating the sum of revenue from the Orders cube",
#         "Breaking down results by the region dimension",
#         "Filtering to only include data from the previous calendar month"
#     ],
#     caveats=[
#         "Revenue is calculated before discounts"
#     ]
# )

Visualization Recommendations

The assistant suggests appropriate visualizations based on the query structure:

Query Type	Dimensions	Recommended Viz
Single measure	None	KPI Card
Single measure	1 categorical	Bar Chart
Single measure	1 time	Line Chart
Single measure	2 dimensions	Grouped Bar
Multiple measures	1 dimension	Multi-series Line
Ranking (Top N)	1 dimension	Horizontal Bar

Best Practices

Writing Good Queries

Be specific about measures: "total revenue" vs just "revenue"
Include time context: "last month", "this year", "past 7 days"
Specify dimensions clearly: "by region", "per product category"
Use comparison language: "vs", "compared to", "change from"

Building Your Semantic Index

Add synonyms: Include common terms users might use
Write descriptions: Help the resolver understand context
Define relationships: Link related measures and dimensions
Include examples: Provide sample values for dimensions

Next Steps

Semantic Queries - Understand the query model
Time Intelligence - Time-based analysis
Anomaly Detection - Proactive monitoring

Overview​

Key Features​

Intent Detection​

Entity Extraction​

Smart Autocomplete​

Query Templates​

Usage​

Basic Query Processing​

Response Structure​

Autocomplete​

Conversational Refinement​

API Endpoints​

Process Query​

Get Autocomplete​

Get Recommendations​

Configuration​

Semantic Index​

Time Dimension​

Query Explanation​

Visualization Recommendations​

Best Practices​

Writing Good Queries​

Building Your Semantic Index​

Next Steps​