Anomaly Detection
Olytix Core's anomaly detection system proactively monitors your metrics and alerts you when unexpected changes occur, helping you catch issues before they impact your business.
Overview
┌─────────────────────────────────────────────────────────────────────┐
│ ANOMALY DETECTION PIPELINE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ MONITORS │───▶│ DETECTORS │───▶│ SCORING │ │
│ │ │ │ │ │ │ │
│ │ Define what │ │ Z-Score │ │ Severity │ │
│ │ to monitor │ │ IQR │ │ Business │ │
│ │ │ │ MAD │ │ Impact │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ STORAGE │◀───│ ALERTS │◀───│ ANALYSIS │ │
│ │ │ │ │ │ │ │
│ │ Historical │ │ Grouping │ │ Root Cause │ │
│ │ Anomalies │ │ Delivery │ │ Correlation │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Key Features
Detection Algorithms
Olytix Core supports multiple statistical detection methods:
| Algorithm | Best For | Description |
|---|---|---|
| Z-Score | Normal distributions | Measures standard deviations from mean |
| IQR | Skewed data | Uses interquartile range for outlier detection |
| MAD | Robust detection | Median Absolute Deviation, resistant to outliers |
Severity Scoring
Each anomaly receives a severity score based on:
- Statistical Significance - How far from expected values
- Historical Context - Comparison to past anomalies
- Duration - How long the anomaly persists
- Trend - Whether the anomaly is growing or stabilizing
Severity Levels:
├── CRITICAL (0.8 - 1.0): Immediate attention required
├── HIGH (0.6 - 0.8): Significant deviation
├── MEDIUM (0.4 - 0.6): Notable but not urgent
├── LOW (0.2 - 0.4): Minor deviation
└── INFO (0.0 - 0.2): Informational only
Business Impact Assessment
Anomalies are assessed for business impact:
ImpactAssessment:
├── revenue_impact: "$45,000 potential revenue at risk"
├── affected_segments: ["Enterprise", "North America"]
├── customer_impact: "~2,500 customers affected"
└── urgency: "High - revenue-critical metric"
Usage
Creating a Monitor
from olytix-core.anomaly.service import AnomalyService
from olytix-core.anomaly.monitors.models import AnomalyMonitor, DetectorType
service = AnomalyService()
# Create a monitor for revenue
monitor = AnomalyMonitor(
name="Daily Revenue Monitor",
cube_name="Orders",
measure_name="revenue",
detector_type=DetectorType.Z_SCORE,
sensitivity=2.5, # Standard deviations threshold
dimensions=["region", "product_category"],
schedule="0 9 * * *", # Daily at 9 AM
enabled=True
)
registered = await service.register_monitor(monitor)
Running Detection
# Run detection for a specific monitor
result = await service.run_detection(
monitor_id=monitor.id,
data=current_values, # List of metric values
context=DetectionContext(
dimension_values={"region": "North America"},
related_metrics={"orders_count": order_counts}
)
)
# Result includes:
# - List of detected anomalies
# - Severity scores
# - Recommended actions
Configuring Alerts
from olytix-core.anomaly.alerts.models import AlertConfig, AlertChannel
# Configure alert delivery
config = AlertConfig(
monitor_id=monitor.id,
channels=[
AlertChannel(type="email", target="data-team@company.com"),
AlertChannel(type="slack", target="#alerts-channel"),
AlertChannel(type="webhook", target="https://api.company.com/alerts")
],
min_severity="HIGH",
grouping_window_minutes=15,
escalation_rules=[
{"after_minutes": 30, "notify": "manager@company.com"},
{"after_minutes": 60, "notify": "director@company.com"}
]
)
Root Cause Analysis
# Analyze potential root causes
analysis = await service.analyze_root_cause(
anomaly_id=anomaly.id,
dimension_drill_down=True,
related_metrics=True
)
# Returns:
# RootCauseAnalysis(
# primary_cause="North America region showing 45% drop",
# contributing_factors=[
# "Enterprise segment down 60%",
# "Product category 'Software' down 35%"
# ],
# correlated_anomalies=[
# "Orders.count also anomalous (correlation: 0.89)"
# ],
# recommendations=[
# "Investigate North America Enterprise Software sales"
# ]
# )
API Endpoints
Create Monitor
POST /api/v1/anomaly/monitors
Content-Type: application/json
{
"name": "Revenue Monitor",
"cube_name": "Orders",
"measure_name": "revenue",
"detector_type": "z_score",
"sensitivity": 2.5,
"dimensions": ["region"],
"schedule": "0 * * * *",
"alert_config": {
"channels": [
{"type": "email", "target": "alerts@company.com"}
],
"min_severity": "HIGH"
}
}
List Anomalies
GET /api/v1/anomaly/detections?
monitor_id=<uuid>&
start_date=2024-01-01&
end_date=2024-01-31&
min_severity=MEDIUM
Get Root Cause Analysis
GET /api/v1/anomaly/detections/<detection_id>/root-cause
Detection Algorithms
Z-Score Detection
Best for normally distributed metrics:
Z-Score = (value - mean) / standard_deviation
Anomaly if: |Z-Score| > sensitivity_threshold
Configuration:
sensitivity: Number of standard deviations (default: 2.5)training_size: Historical values for baseline (default: 100)
IQR Detection
Best for skewed distributions:
IQR = Q3 - Q1 (75th percentile - 25th percentile)
Lower Bound = Q1 - (sensitivity * IQR)
Upper Bound = Q3 + (sensitivity * IQR)
Anomaly if: value < Lower Bound OR value > Upper Bound
Configuration:
sensitivity: IQR multiplier (default: 1.5)training_size: Historical values for quartile calculation
MAD Detection
Most robust to existing outliers:
MAD = median(|values - median(values)|)
Modified Z-Score = 0.6745 * (value - median) / MAD
Anomaly if: |Modified Z-Score| > sensitivity_threshold
Configuration:
sensitivity: Modified Z-Score threshold (default: 3.0)training_size: Historical values for median calculation
Alert Management
Alert Grouping
Related alerts are grouped to prevent alert fatigue:
# Grouping strategies:
# - By monitor: Group all alerts from same monitor
# - By dimension: Group alerts with same dimension values
# - By time: Group alerts within time window
# - By severity: Group alerts of similar severity
grouper = AlertGrouper(
strategy="dimension",
window_minutes=15,
min_group_size=2
)
Alert Delivery
Delivery Channels:
├── Email: HTML-formatted alert with charts
├── Slack: Interactive message with actions
├── Webhook: JSON payload for custom integrations
├── PagerDuty: For critical alerts
└── SMS: For high-priority notifications
Alert States
Alert Lifecycle:
TRIGGERED → ACKNOWLEDGED → INVESTIGATING → RESOLVED
↓
ESCALATED
Correlation Analysis
Find related metrics affected by the same issue:
correlations = await service.find_correlations(
anomaly_id=anomaly.id,
metrics_to_check=["Orders.count", "Orders.avg_value", "Customers.new_signups"],
time_window_hours=24
)
# Returns metrics with similar anomalous behavior:
# [
# CorrelatedMetric(name="Orders.count", correlation=0.92, lag_hours=0),
# CorrelatedMetric(name="Customers.new_signups", correlation=0.78, lag_hours=2)
# ]
Best Practices
Choosing a Detector
- Z-Score: Use for metrics with stable, normal distributions (e.g., daily page views)
- IQR: Use for metrics with outliers or skewed distributions (e.g., order values)
- MAD: Use when you suspect historical data contains anomalies
Setting Sensitivity
- High sensitivity (lower threshold): More alerts, fewer missed anomalies
- Low sensitivity (higher threshold): Fewer alerts, may miss smaller anomalies
Recommendations:
- Start with default values
- Monitor false positive rate
- Adjust based on business criticality
Monitor Design
- One metric per monitor: Easier to tune and understand
- Include relevant dimensions: Enable drill-down analysis
- Set appropriate schedules: Match your data update frequency
- Configure escalation: Ensure critical issues get attention
Example: Complete Setup
from olytix-core.anomaly.service import AnomalyService
from olytix-core.anomaly.monitors.models import AnomalyMonitor, DetectorType
from olytix-core.anomaly.alerts.models import AlertConfig, AlertChannel
# Initialize service
service = AnomalyService()
# Create revenue monitor with Z-Score
revenue_monitor = await service.register_monitor(AnomalyMonitor(
name="Hourly Revenue",
cube_name="Orders",
measure_name="revenue",
detector_type=DetectorType.Z_SCORE,
sensitivity=2.5,
dimensions=["region", "product_category"],
schedule="0 * * * *" # Every hour
))
# Configure alerts
await service.configure_alerts(AlertConfig(
monitor_id=revenue_monitor.id,
channels=[
AlertChannel(type="slack", target="#revenue-alerts"),
AlertChannel(type="email", target="finance@company.com")
],
min_severity="MEDIUM",
grouping_window_minutes=30
))
# The service will now automatically:
# 1. Run detection every hour
# 2. Score anomalies by severity
# 3. Assess business impact
# 4. Group and deliver alerts
# 5. Store historical anomalies for analysis
Next Steps
- Query Assistant - Investigate anomalies with natural language
- Data Profiling - Understand your data distributions
- Monitoring & Logging - Operational monitoring