Skip to main content

Anomaly Detection

For Data Analysts

Olytix Core's anomaly detection system proactively monitors your metrics and alerts you when unexpected changes occur, helping you catch issues before they impact your business.

Overview

┌─────────────────────────────────────────────────────────────────────┐
│ ANOMALY DETECTION PIPELINE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ MONITORS │───▶│ DETECTORS │───▶│ SCORING │ │
│ │ │ │ │ │ │ │
│ │ Define what │ │ Z-Score │ │ Severity │ │
│ │ to monitor │ │ IQR │ │ Business │ │
│ │ │ │ MAD │ │ Impact │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ STORAGE │◀───│ ALERTS │◀───│ ANALYSIS │ │
│ │ │ │ │ │ │ │
│ │ Historical │ │ Grouping │ │ Root Cause │ │
│ │ Anomalies │ │ Delivery │ │ Correlation │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘

Key Features

Detection Algorithms

Olytix Core supports multiple statistical detection methods:

AlgorithmBest ForDescription
Z-ScoreNormal distributionsMeasures standard deviations from mean
IQRSkewed dataUses interquartile range for outlier detection
MADRobust detectionMedian Absolute Deviation, resistant to outliers

Severity Scoring

Each anomaly receives a severity score based on:

  • Statistical Significance - How far from expected values
  • Historical Context - Comparison to past anomalies
  • Duration - How long the anomaly persists
  • Trend - Whether the anomaly is growing or stabilizing
Severity Levels:
├── CRITICAL (0.8 - 1.0): Immediate attention required
├── HIGH (0.6 - 0.8): Significant deviation
├── MEDIUM (0.4 - 0.6): Notable but not urgent
├── LOW (0.2 - 0.4): Minor deviation
└── INFO (0.0 - 0.2): Informational only

Business Impact Assessment

Anomalies are assessed for business impact:

ImpactAssessment:
├── revenue_impact: "$45,000 potential revenue at risk"
├── affected_segments: ["Enterprise", "North America"]
├── customer_impact: "~2,500 customers affected"
└── urgency: "High - revenue-critical metric"

Usage

Creating a Monitor

from olytix-core.anomaly.service import AnomalyService
from olytix-core.anomaly.monitors.models import AnomalyMonitor, DetectorType

service = AnomalyService()

# Create a monitor for revenue
monitor = AnomalyMonitor(
name="Daily Revenue Monitor",
cube_name="Orders",
measure_name="revenue",
detector_type=DetectorType.Z_SCORE,
sensitivity=2.5, # Standard deviations threshold
dimensions=["region", "product_category"],
schedule="0 9 * * *", # Daily at 9 AM
enabled=True
)

registered = await service.register_monitor(monitor)

Running Detection

# Run detection for a specific monitor
result = await service.run_detection(
monitor_id=monitor.id,
data=current_values, # List of metric values
context=DetectionContext(
dimension_values={"region": "North America"},
related_metrics={"orders_count": order_counts}
)
)

# Result includes:
# - List of detected anomalies
# - Severity scores
# - Recommended actions

Configuring Alerts

from olytix-core.anomaly.alerts.models import AlertConfig, AlertChannel

# Configure alert delivery
config = AlertConfig(
monitor_id=monitor.id,
channels=[
AlertChannel(type="email", target="data-team@company.com"),
AlertChannel(type="slack", target="#alerts-channel"),
AlertChannel(type="webhook", target="https://api.company.com/alerts")
],
min_severity="HIGH",
grouping_window_minutes=15,
escalation_rules=[
{"after_minutes": 30, "notify": "manager@company.com"},
{"after_minutes": 60, "notify": "director@company.com"}
]
)

Root Cause Analysis

# Analyze potential root causes
analysis = await service.analyze_root_cause(
anomaly_id=anomaly.id,
dimension_drill_down=True,
related_metrics=True
)

# Returns:
# RootCauseAnalysis(
# primary_cause="North America region showing 45% drop",
# contributing_factors=[
# "Enterprise segment down 60%",
# "Product category 'Software' down 35%"
# ],
# correlated_anomalies=[
# "Orders.count also anomalous (correlation: 0.89)"
# ],
# recommendations=[
# "Investigate North America Enterprise Software sales"
# ]
# )

API Endpoints

Create Monitor

POST /api/v1/anomaly/monitors
Content-Type: application/json

{
"name": "Revenue Monitor",
"cube_name": "Orders",
"measure_name": "revenue",
"detector_type": "z_score",
"sensitivity": 2.5,
"dimensions": ["region"],
"schedule": "0 * * * *",
"alert_config": {
"channels": [
{"type": "email", "target": "alerts@company.com"}
],
"min_severity": "HIGH"
}
}

List Anomalies

GET /api/v1/anomaly/detections?
monitor_id=<uuid>&
start_date=2024-01-01&
end_date=2024-01-31&
min_severity=MEDIUM

Get Root Cause Analysis

GET /api/v1/anomaly/detections/<detection_id>/root-cause

Detection Algorithms

Z-Score Detection

Best for normally distributed metrics:

Z-Score = (value - mean) / standard_deviation

Anomaly if: |Z-Score| > sensitivity_threshold

Configuration:

  • sensitivity: Number of standard deviations (default: 2.5)
  • training_size: Historical values for baseline (default: 100)

IQR Detection

Best for skewed distributions:

IQR = Q3 - Q1 (75th percentile - 25th percentile)
Lower Bound = Q1 - (sensitivity * IQR)
Upper Bound = Q3 + (sensitivity * IQR)

Anomaly if: value < Lower Bound OR value > Upper Bound

Configuration:

  • sensitivity: IQR multiplier (default: 1.5)
  • training_size: Historical values for quartile calculation

MAD Detection

Most robust to existing outliers:

MAD = median(|values - median(values)|)
Modified Z-Score = 0.6745 * (value - median) / MAD

Anomaly if: |Modified Z-Score| > sensitivity_threshold

Configuration:

  • sensitivity: Modified Z-Score threshold (default: 3.0)
  • training_size: Historical values for median calculation

Alert Management

Alert Grouping

Related alerts are grouped to prevent alert fatigue:

# Grouping strategies:
# - By monitor: Group all alerts from same monitor
# - By dimension: Group alerts with same dimension values
# - By time: Group alerts within time window
# - By severity: Group alerts of similar severity

grouper = AlertGrouper(
strategy="dimension",
window_minutes=15,
min_group_size=2
)

Alert Delivery

Delivery Channels:
├── Email: HTML-formatted alert with charts
├── Slack: Interactive message with actions
├── Webhook: JSON payload for custom integrations
├── PagerDuty: For critical alerts
└── SMS: For high-priority notifications

Alert States

Alert Lifecycle:
TRIGGERED → ACKNOWLEDGED → INVESTIGATING → RESOLVED

ESCALATED

Correlation Analysis

Find related metrics affected by the same issue:

correlations = await service.find_correlations(
anomaly_id=anomaly.id,
metrics_to_check=["Orders.count", "Orders.avg_value", "Customers.new_signups"],
time_window_hours=24
)

# Returns metrics with similar anomalous behavior:
# [
# CorrelatedMetric(name="Orders.count", correlation=0.92, lag_hours=0),
# CorrelatedMetric(name="Customers.new_signups", correlation=0.78, lag_hours=2)
# ]

Best Practices

Choosing a Detector

  1. Z-Score: Use for metrics with stable, normal distributions (e.g., daily page views)
  2. IQR: Use for metrics with outliers or skewed distributions (e.g., order values)
  3. MAD: Use when you suspect historical data contains anomalies

Setting Sensitivity

  • High sensitivity (lower threshold): More alerts, fewer missed anomalies
  • Low sensitivity (higher threshold): Fewer alerts, may miss smaller anomalies

Recommendations:

  • Start with default values
  • Monitor false positive rate
  • Adjust based on business criticality

Monitor Design

  1. One metric per monitor: Easier to tune and understand
  2. Include relevant dimensions: Enable drill-down analysis
  3. Set appropriate schedules: Match your data update frequency
  4. Configure escalation: Ensure critical issues get attention

Example: Complete Setup

from olytix-core.anomaly.service import AnomalyService
from olytix-core.anomaly.monitors.models import AnomalyMonitor, DetectorType
from olytix-core.anomaly.alerts.models import AlertConfig, AlertChannel

# Initialize service
service = AnomalyService()

# Create revenue monitor with Z-Score
revenue_monitor = await service.register_monitor(AnomalyMonitor(
name="Hourly Revenue",
cube_name="Orders",
measure_name="revenue",
detector_type=DetectorType.Z_SCORE,
sensitivity=2.5,
dimensions=["region", "product_category"],
schedule="0 * * * *" # Every hour
))

# Configure alerts
await service.configure_alerts(AlertConfig(
monitor_id=revenue_monitor.id,
channels=[
AlertChannel(type="slack", target="#revenue-alerts"),
AlertChannel(type="email", target="finance@company.com")
],
min_severity="MEDIUM",
grouping_window_minutes=30
))

# The service will now automatically:
# 1. Run detection every hour
# 2. Score anomalies by severity
# 3. Assess business impact
# 4. Group and deliver alerts
# 5. Store historical anomalies for analysis

Next Steps