Skip to main content

Intelligent Query Assistant

For Data Analysts

The Query Assistant provides a natural language interface for data exploration, allowing users to ask questions in plain English and receive optimized semantic queries.

Overview

┌─────────────────────────────────────────────────────────────────────┐
│ QUERY ASSISTANT ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ "Show me total revenue by region for last month" │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ INTENT DETECTOR │ │
│ │ Detects: AGGREGATE intent with time context │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ ENTITY EXTRACTOR │ │
│ │ Extracts: revenue (measure), region (dimension), │ │
│ │ last_month (time period) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ ENTITY RESOLVER │ │
│ │ Resolves: Orders.revenue, Orders.region, │ │
│ │ Orders.created_at with date_range │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ QUERY BUILDER │ │
│ │ Generates semantic query with measures, dimensions, │ │
│ │ filters, and time dimensions │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Generated Query + Explanation │
│ │
└─────────────────────────────────────────────────────────────────────┘

Key Features

Intent Detection

The assistant automatically detects what type of analysis you want:

IntentDescriptionExample Questions
AGGREGATESum, count, average metrics"What is the total revenue?"
COMPARECompare periods or segments"Compare sales this month vs last month"
TRENDAnalyze changes over time"Show revenue trend over the past year"
RANKTop/bottom analysis"Top 10 products by sales"
FILTERFilter data by conditions"Orders where status is completed"
EXPLOREData discovery"What metrics are available?"

Entity Extraction

Automatically identifies semantic entities from natural language:

# Example extraction from: "Show total revenue by region for electronics last month"

Entities Extracted:
├── Measures: ["revenue"]
├── Dimensions: ["region"]
├── Filter Values: ["electronics"]
├── Time Period: "last_month"
├── Aggregation: "sum" (from "total")
└── Numbers: []

Smart Autocomplete

Get intelligent suggestions as you type:

Input: "Show rev"
Suggestions:
├── "Show revenue" (measure match)
├── "Show revenue by region" (common pattern)
├── "Show revenue trend" (template match)
└── "Show revenue for last month" (time pattern)

Query Templates

Pre-built templates for common analysis patterns:

Templates Available:
- "Total {measure} by {dimension}"
- "Top {n} {dimension} by {measure}"
- "{measure} trend over {time_period}"
- "Compare {measure} {period1} vs {period2}"
- "{measure} breakdown by {dimension}"

Usage

Basic Query Processing

from olytix-core.assistant.engine import QueryAssistant

# Initialize with semantic index
assistant = QueryAssistant(
semantic_index=semantic_layer.get_index(),
default_time_dimension="Orders.created_at"
)

# Process a natural language query
response = assistant.process_query("Show me total revenue by region")

# Response includes:
# - Generated query (measures, dimensions, filters)
# - Natural language explanation
# - Visualization recommendation
# - Confidence score

Response Structure

AssistantResponse:
├── response_type: "query_generated"
├── query:
│ ├── measures: ["Orders.revenue"]
│ ├── dimensions: ["Orders.region"]
│ ├── filters: []
│ └── time_dimensions: []
├── narration:
│ ├── summary: "Showing total revenue broken down by region"
│ └── details: ["Aggregating revenue using SUM", "Grouped by region"]
├── visualization:
│ ├── type: "bar_chart"
│ └── config: { x: "region", y: "revenue" }
├── confidence: 0.92
└── warnings: []

Autocomplete

# Get autocomplete suggestions
suggestions = assistant.get_autocomplete("Show revenue by")

# Returns ranked suggestions:
# [
# AutocompleteSuggestion(text="Show revenue by region", score=0.95),
# AutocompleteSuggestion(text="Show revenue by product", score=0.88),
# AutocompleteSuggestion(text="Show revenue by month", score=0.85),
# ]

Conversational Refinement

# Start a conversation
response1 = assistant.process_query(
"Show revenue",
user_id="analyst_1"
)
# Query: { measures: ["Orders.revenue"] }

# Continue the conversation
response2 = assistant.process_query(
"Break it down by region",
user_id="analyst_1",
conversation_id=response1.metadata["conversation_id"]
)
# Query: { measures: ["Orders.revenue"], dimensions: ["Orders.region"] }

# Further refinement
response3 = assistant.process_query(
"Just for last month",
user_id="analyst_1",
conversation_id=response1.metadata["conversation_id"]
)
# Query includes time filter for last month

API Endpoints

Process Query

POST /api/v1/assistant/query
Content-Type: application/json

{
"query": "Show total revenue by region for last month",
"user_id": "analyst_1",
"conversation_id": null,
"options": {
"include_explanation": true,
"include_visualization": true
}
}

Get Autocomplete

GET /api/v1/assistant/autocomplete?input=Show%20rev&limit=5

Get Recommendations

GET /api/v1/assistant/recommendations?user_id=analyst_1

Configuration

Semantic Index

The assistant requires a semantic index mapping natural language terms to semantic model elements:

semantic_index = {
"measures": {
"Orders.revenue": {
"label": "Revenue",
"type": "sum",
"cube": "Orders",
"name": "revenue",
"description": "Total order revenue",
"synonyms": ["sales", "income", "earnings"]
}
},
"dimensions": {
"Orders.region": {
"label": "Region",
"type": "string",
"cube": "Orders",
"name": "region",
"synonyms": ["area", "territory", "location"]
}
}
}

Time Dimension

Configure the default time dimension for temporal queries:

assistant = QueryAssistant(
semantic_index=index,
default_time_dimension="Orders.created_at"
)

Query Explanation

Every generated query includes a human-readable explanation:

narration = assistant.explain_query(query)

# Output:
# NarratedQuery(
# summary="Showing total revenue grouped by region for the last month",
# details=[
# "Calculating the sum of revenue from the Orders cube",
# "Breaking down results by the region dimension",
# "Filtering to only include data from the previous calendar month"
# ],
# caveats=[
# "Revenue is calculated before discounts"
# ]
# )

Visualization Recommendations

The assistant suggests appropriate visualizations based on the query structure:

Query TypeDimensionsRecommended Viz
Single measureNoneKPI Card
Single measure1 categoricalBar Chart
Single measure1 timeLine Chart
Single measure2 dimensionsGrouped Bar
Multiple measures1 dimensionMulti-series Line
Ranking (Top N)1 dimensionHorizontal Bar

Best Practices

Writing Good Queries

  1. Be specific about measures: "total revenue" vs just "revenue"
  2. Include time context: "last month", "this year", "past 7 days"
  3. Specify dimensions clearly: "by region", "per product category"
  4. Use comparison language: "vs", "compared to", "change from"

Building Your Semantic Index

  1. Add synonyms: Include common terms users might use
  2. Write descriptions: Help the resolver understand context
  3. Define relationships: Link related measures and dimensions
  4. Include examples: Provide sample values for dimensions

Next Steps