AI/ML Integration
AI and machine learning require high-quality, well-documented data. Olytix Core provides the governance, consistency, and access controls needed to safely power AI initiatives while maintaining data trust.
The AI Data Challenge
AI projects often fail due to data issues:
Common AI Project Failures
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Data Quality (45%)
├── Inconsistent definitions
├── Missing values
├── Outdated data
└── No documentation
Data Access (30%)
├── Can't find the right data
├── No access permissions
├── Data silos
└── Security concerns
Data Governance (25%)
├── No lineage for model inputs
├── Can't explain predictions
├── Compliance violations
└── No version control
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Olytix Core AI Solution
Governed Data for AI
┌─────────────────────────────────────────────────────────────────────┐
│ AI-Ready Data Architecture │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Raw Data Olytix Core Semantic AI/ML Applications │
│ Sources Layer (Governed Access) │
│ │
│ ┌─────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ CRM │────►│ │────►│ Natural Language Query │ │
│ └─────────┘ │ │ └─────────────────────────┘ │
│ │ Metrics │ │
│ ┌─────────┐ │ Cubes │ ┌─────────────────────────┐ │
│ │ ERP │────►│ Lineage │────►│ Predictive Models │ │
│ └─────────┘ │ Security │ └─────────────────────────┘ │
│ │ │ │
│ ┌─────────┐ │ │ ┌─────────────────────────┐ │
│ │ Product │────►│ │────►│ Recommendation Engine │ │
│ └─────────┘ └─────────────┘ └─────────────────────────┘ │
│ │ │
│ ▼ │
│ Complete Audit Trail │
│ Model Input Lineage │
│ Explainable AI Ready │
└─────────────────────────────────────────────────────────────────────┘
Natural Language Querying
Ask Questions in Plain English
Olytix Core's AI integration enables natural language data access:
User: "What was our revenue last quarter compared to the same quarter last year?"
Olytix Core AI Translation:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Understanding:
• Metric: revenue (net_revenue)
• Time Period: Q4 2024
• Comparison: Q4 2023 (same quarter prior year)
Generated Query:
{
"metrics": ["net_revenue"],
"dimensions": ["orders.order_date.quarter"],
"time_intelligence": {
"compare_to": "same_period_prior_year"
},
"filters": [
{"dimension": "orders.order_date.quarter", "operator": "equals", "value": "2024-Q4"}
]
}
Result:
Q4 2024: $4.2M
Q4 2023: $3.8M
Change: +10.5%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Semantic Understanding
The AI understands your business terminology:
# Configure business terminology mappings
ai:
terminology:
- term: "sales"
maps_to: "net_revenue"
context: "Usually refers to net revenue unless gross is specified"
- term: "customers"
maps_to: "customers.active_count"
context: "Default to active customers"
- term: "last quarter"
maps_to: "prior_quarter"
context: "Most recent completed quarter"
- term: "growth"
maps_to: "period_over_period_change"
context: "Compare to prior period"
Semantic Search
Finding Relevant Metrics
User: "I need data about customer satisfaction"
Olytix Core Semantic Search Results:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Highly Relevant (>90% match):
├── nps_score (Net Promoter Score)
│ Description: "Customer satisfaction score from -100 to 100"
│ Cube: customers
│ Certified: ✓
│
├── customer_satisfaction_rating
│ Description: "Average support ticket satisfaction (1-5)"
│ Cube: support
│ Certified: ✓
│
└── csat_score
Description: "Post-interaction satisfaction percentage"
Cube: interactions
Certified: ✓
Related Metrics:
├── customer_churn_rate (Customer retention indicator)
├── support_ticket_count (Volume of issues)
└── avg_resolution_time (Service quality)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Embedding-Based Discovery
# Configure embeddings for semantic search
embeddings:
model: "text-embedding-3-small"
index:
- type: metrics
fields: [name, description, calculation_notes]
- type: dimensions
fields: [name, description, example_values]
- type: cubes
fields: [name, description, business_context]
search:
min_relevance: 0.7
max_results: 10
boost_certified: 1.2
Machine Learning Features
Feature Store Integration
from olytix-core import Olytix CoreClient
client = Olytix CoreClient("http://localhost:8000")
# Get features for ML model
def get_customer_features(customer_ids: list):
"""Get features for churn prediction model."""
return client.query(
measures=[
"customers.total_revenue",
"customers.order_count",
"customers.avg_order_value",
"customers.days_since_last_order",
"customers.support_ticket_count",
"customers.nps_score"
],
dimensions=[
"customers.customer_id",
"customers.segment",
"customers.tenure_months"
],
filters=[
{"dimension": "customers.customer_id", "operator": "inList", "value": customer_ids}
]
).to_dataframe()
# Use in ML pipeline
features = get_customer_features(customer_list)
predictions = model.predict(features)
Model Input Documentation
Every ML model input is documented:
Model: Customer Churn Prediction v2.3
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Input Features (from Olytix Core):
────────────────────────────────────────────────────────────
Feature │ Source Metric │ Lineage
─────────────────────────┼──────────────────────────┼─────────
total_revenue │ customers.total_revenue │ ✓ Traced
order_count │ customers.order_count │ ✓ Traced
days_since_last_order │ customers.recency │ ✓ Traced
support_tickets │ support.ticket_count │ ✓ Traced
nps_score │ customers.nps_score │ ✓ Traced
segment │ customers.segment │ ✓ Traced
Data Quality:
────────────────────────────────────────────────────────────
• All features from certified metrics
• Complete lineage from source to model
• 99.8% feature coverage (0.2% null values)
• Last training data: 2024-01-15
Governance:
────────────────────────────────────────────────────────────
• Model owner: Data Science Team
• Last audit: 2024-01-10
• Bias testing: Passed
• Explainability: SHAP values documented
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
LLM Integration
Retrieval-Augmented Generation (RAG)
# Configure RAG for business intelligence
rag:
knowledge_base:
sources:
- type: metric_definitions
include_lineage: true
include_examples: true
- type: business_glossary
include_context: true
- type: query_history
filter: successful_only
max_age_days: 90
retrieval:
model: "text-embedding-3-small"
top_k: 5
min_relevance: 0.75
generation:
model: "gpt-4"
temperature: 0.1
system_prompt: |
You are a business intelligence assistant with access to
the company's semantic layer. Use the provided context
to answer questions accurately. Always cite the metrics
and their definitions.
Conversational Analytics
User: "Why did revenue drop in December?"
Olytix Core AI Assistant:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Let me analyze the December revenue decline.
Revenue Analysis - December 2024:
• Total Revenue: $3.8M (↓15% vs November)
• Expected: $4.5M based on seasonal pattern
Contributing Factors:
1. Order Volume: ↓12% (3,200 vs 3,650 orders)
2. Average Order Value: ↓4% ($1,188 vs $1,233)
3. Refund Rate: ↑2pp (5% vs 3%)
Breakdown by Segment:
• Enterprise: $2.1M (↓8%) - 2 large renewals delayed to January
• Mid-Market: $1.2M (↓18%) - Holiday slowdown
• SMB: $0.5M (↓25%) - Typical seasonal pattern
Comparison to Prior Year December:
• 2024: $3.8M
• 2023: $3.5M
• YoY Growth: +8.6% (still positive)
Recommendation:
The December decline appears to be primarily seasonal, with
some impact from delayed enterprise renewals. The YoY comparison
shows healthy growth. Monitor January for enterprise renewals.
Sources: net_revenue, order_count, avg_order_value, refund_rate
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AI Governance
Model Input Lineage
# Track which data feeds which models
model_registry:
- name: churn_prediction_v2
inputs:
metrics:
- customers.total_revenue
- customers.order_count
- customers.nps_score
lineage: required
outputs:
- name: churn_probability
type: float
range: [0, 1]
governance:
owner: data-science-team
review_cycle: quarterly
bias_testing: required
explainability: shap
AI Access Controls
security:
ai_access:
# Which AI systems can access what
natural_language_query:
allowed_users: all
restricted_metrics:
- compensation_data
- pii_metrics
audit: true
ml_feature_access:
allowed_services:
- churn-prediction-service
- recommendation-engine
require_model_registration: true
audit: true
llm_context:
allowed_data:
- metric_definitions
- business_glossary
- anonymized_examples
prohibited:
- raw_customer_data
- financial_details
Explainability
Prediction Explanation - Customer: ACME Corp
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Churn Prediction: 72% (High Risk)
Top Contributing Factors:
────────────────────────────────────────────────────────────
Factor │ Value │ Impact │ Source
──────────────────────────┼────────────┼───────────┼──────────
Days since last order │ 45 days │ +0.25 │ customers.recency
Support tickets (90d) │ 8 tickets │ +0.18 │ support.count
NPS score │ 6 (passive)│ +0.12 │ customers.nps
Login frequency trend │ ↓ 40% │ +0.10 │ usage.logins
Contract renewal │ 30 days │ +0.07 │ contracts.renewal
──────────────────────────┼────────────┼───────────┼──────────
│ │ Total: 0.72│
Data Sources (All Traced):
• customers cube → dim_customers → CRM sync
• support cube → fct_tickets → Zendesk API
• usage cube → fct_events → Product analytics
Governance:
• Model version: v2.3
• Training date: 2024-01-01
• Bias audit: Passed (2024-01-10)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Implementation Examples
Python SDK for AI
from olytix-core import Olytix CoreClient
from olytix-core.ai import NaturalLanguageQuery, SemanticSearch
client = Olytix CoreClient("http://localhost:8000")
# Natural language query
nlq = NaturalLanguageQuery(client)
result = nlq.query("Show me top 10 customers by revenue this year")
# Semantic search for metrics
search = SemanticSearch(client)
metrics = search.find("customer satisfaction")
# Get embeddings for custom use
embeddings = client.get_embeddings(
texts=["revenue", "customer count", "churn rate"],
model="text-embedding-3-small"
)
REST API for AI
# Natural language query
curl -X POST http://localhost:8000/api/v1/ai/query \
-H "Content-Type: application/json" \
-d '{
"question": "What is the monthly revenue trend?",
"context": {
"time_range": "last 12 months"
}
}'
# Semantic search
curl -X POST http://localhost:8000/api/v1/ai/search \
-H "Content-Type: application/json" \
-d '{
"query": "customer satisfaction metrics",
"types": ["metrics", "dimensions"],
"limit": 10
}'
Best Practices
AI Data Quality
- Use certified metrics for ML features
- Document feature definitions clearly
- Track model input lineage end-to-end
- Version your feature sets with models
AI Governance
- Register all models that consume Olytix Core data
- Audit AI data access regularly
- Test for bias using documented methods
- Maintain explainability for all predictions
Next Steps
Ready to power AI with Olytix Core?
AI Success
The best AI models are built on trusted, well-documented data. Invest in data quality and governance before scaling AI initiatives.