Data Governance
Olytix Core's data governance features help organizations maintain data quality, establish clear ownership, and ensure compliance through certification workflows, business glossaries, and comprehensive audit capabilities.
Overview
┌─────────────────────────────────────────────────────────────────────┐
│ DATA GOVERNANCE FRAMEWORK │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ CERTIFICATION│ │ GLOSSARY │ │ OWNERSHIP │ │
│ │ │ │ │ │ │ │
│ │ Draft │ │ Terms │ │ Owners │ │
│ │ Review │ │ Definitions │ │ Stewards │ │
│ │ Certified │ │ Synonyms │ │ Teams │ │
│ │ Deprecated │ │ Relationships│ │ Contacts │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ └───────────────────┼───────────────────┘ │
│ ▼ │
│ ┌─ ─────────────────────────────────────────────────────────┐ │
│ │ ARTIFACT METADATA │ │
│ │ │ │
│ │ Cubes • Measures • Dimensions • Metrics • Models │ │
│ │ │ │
│ │ Status • Owner • Tags • Description • Version • Links │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ AUDIT │ │ SEARCH & │ │ COMPLIANCE │ │
│ │ │ │ DISCOVERY │ │ │ │
│ │ Access logs │ │ Full-text │ │ Policies │ │
│ │ Changes │ │ Filters │ │ Reports │ │
│ │ Exports │ │ Browse │ │ Attestation │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Key Features
Certification Workflow
Establish trust in your data through a formal certification process:
| Status | Description | Badge |
|---|---|---|
| Draft | In development, not ready for use | Gray |
| Pending Review | Submitted for certification | Yellow |
| Certified | Approved, trusted for production use | Green |
| Deprecated | Being phased out, use alternatives | Orange |
| Archived | No longer available | Red |
Business Glossary
Maintain a single source of truth for business terminology:
- Term definitions - Clear, authoritative definitions
- Synonyms - Alternative names and abbreviations
- Related terms - Connections between concepts
- Artifact links - Map terms to measures/dimensions
Ownership Model
Assign clear responsibility for data assets:
- Data Owners - Accountable for data quality
- Data Stewards - Day-to-day management
- Technical Owners - Implementation responsibility
- Teams - Group ownership for shared assets
Audit & Compliance
Track all changes and access:
- Change history - Who changed what, when
- Access logs - Who queried what data
- Export tracking - Data leaving the platform
- Compliance reports - Regulatory reporting
Usage
Certification Management
from olytix-core.governance.service import GovernanceService
from olytix-core.governance.models import CertificationStatus
service = GovernanceService()
# Submit artifact for certification
await service.submit_for_certification(
artifact_type="measure",
artifact_id="Orders.revenue",
submitted_by="analyst_123",
notes="Validated against finance system, matches within 0.1%"
)
# Review and certify (requires reviewer role)
await service.certify_artifact(
artifact_type="measure",
artifact_id="Orders.revenue",
certified_by="data_steward_456",
certification_notes="Approved after validation review",
valid_until="2025-01-15" # Optional expiration
)
# Deprecate an artifact
await service.deprecate_artifact(
artifact_type="measure",
artifact_id="Orders.legacy_revenue",
deprecated_by="data_steward_456",
reason="Replaced by Orders.revenue which includes all revenue types",
replacement="Orders.revenue",
sunset_date="2024-06-01"
)
Certification Status
# Get certification status
status = await service.get_certification_status(
artifact_type="measure",
artifact_id="Orders.revenue"
)
# CertificationInfo:
# ├── status: CertificationStatus.CERTIFIED
# ├── certified_by: "data_steward_456"
# ├── certified_at: "2024-01-15T10:30:00Z"
# ├── valid_until: "2025-01-15"
# ├── certification_notes: "Approved after validation review"
# ├── version: 3
# └── history: [
# {"status": "draft", "at": "2024-01-01", "by": "analyst_123"},
# {"status": "pending_review", "at": "2024-01-10", "by": "analyst_123"},
# {"status": "certified", "at": "2024-01-15", "by": "data_steward_456"}
# ]
# List all certified artifacts
certified = await service.list_artifacts(
status=CertificationStatus.CERTIFIED
)
Business Glossary
from olytix-core.governance.glossary.models import GlossaryTerm
# Create a glossary term
term = await service.create_glossary_term(
GlossaryTerm(
name="Revenue",
definition="Total income generated from sales of goods and services before any deductions",
synonyms=["Sales", "Income", "Turnover"],
category="Finance",
examples=["Product sales revenue", "Service revenue", "Subscription revenue"],
related_terms=["Gross Revenue", "Net Revenue", "ARR"],
created_by="data_steward_456"
)
)
# Link term to artifacts
await service.link_term_to_artifact(
term_id=term.id,
artifact_type="measure",
artifact_id="Orders.revenue",
relationship="defines" # defines, relates_to, synonym_of
)
# Search glossary
results = await service.search_glossary(
query="revenue",
category="Finance",
include_synonyms=True
)
Glossary Term Structure
GlossaryTerm:
├── id: "term_123"
├── name: "Revenue"
├── definition: "Total income generated from..."
├── synonyms: ["Sales", "Income", "Turnover"]
├── category: "Finance"
├── subcategory: "Income Statement"
├── examples: ["Product sales revenue", ...]
├── related_terms: ["Gross Revenue", "Net Revenue"]
├── formula: null # Optional for calculated terms
├── owner: "finance_team"
├── status: "approved"
│
├── linked_artifacts:
│ ├── measures: ["Orders.revenue", "Orders.gross_revenue"]
│ ├── dimensions: []
│ └── metrics: ["mrr", "arr"]
│
├── created_by: "data_steward_456"
├── created_at: "2024-01-10T09:00:00Z"
├── updated_at: "2024-01-15T14:30:00Z"
└── version: 2
Ownership Management
from olytix-core.governance.ownership.models import OwnershipAssignment, OwnerRole
# Assign owner to a cube
await service.assign_owner(
OwnershipAssignment(
artifact_type="cube",
artifact_id="Orders",
owner_type="user",
owner_id="finance_manager_789",
role=OwnerRole.DATA_OWNER,
assigned_by="admin_001"
)
)
# Assign steward
await service.assign_owner(
OwnershipAssignment(
artifact_type="cube",
artifact_id="Orders",
owner_type="user",
owner_id="analyst_123",
role=OwnerRole.DATA_STEWARD,
assigned_by="finance_manager_789"
)
)
# Assign team ownership
await service.assign_owner(
OwnershipAssignment(
artifact_type="cube",
artifact_id="Orders",
owner_type="team",
owner_id="analytics_team",
role=OwnerRole.TECHNICAL_OWNER,
assigned_by="admin_001"
)
)
# Get ownership info
ownership = await service.get_ownership(
artifact_type="cube",
artifact_id="Orders"
)
# OwnershipInfo:
# ├── data_owner: {"type": "user", "id": "finance_manager_789", "name": "Jane Smith"}
# ├── data_stewards: [{"type": "user", "id": "analyst_123", "name": "John Doe"}]
# ├── technical_owners: [{"type": "team", "id": "analytics_team", "name": "Analytics Team"}]
# └── contact_email: "analytics@company.com"
Artifact Metadata
# Update artifact metadata
await service.update_artifact_metadata(
artifact_type="measure",
artifact_id="Orders.revenue",
metadata={
"description": "Total order revenue including taxes, excluding returns",
"tags": ["finance", "core-metric", "certified"],
"documentation_url": "https://wiki.company.com/metrics/revenue",
"refresh_frequency": "hourly",
"data_classification": "internal",
"pii_flag": False
}
)
# Get full artifact details
details = await service.get_artifact_details(
artifact_type="measure",
artifact_id="Orders.revenue"
)
# ArtifactDetails:
# ├── type: "measure"
# ├── id: "Orders.revenue"
# ├── name: "Revenue"
# ├── description: "Total order revenue..."
# ├── definition: "SUM(order_items.price * order_items.quantity)"
# │
# ├── certification:
# │ ├── status: "certified"
# │ ├── certified_by: "data_steward_456"
# │ └── valid_until: "2025-01-15"
# │
# ├── ownership:
# │ ├── data_owner: "finance_manager_789"
# │ └── stewards: ["analyst_123"]
# │
# ├── glossary_terms: ["Revenue", "Sales"]
# ├── tags: ["finance", "core-metric", "certified"]
# ├── data_classification: "internal"
# │
# ├── lineage:
# │ ├── sources: ["raw.orders", "raw.order_items"]
# │ └── derived_from: ["stg_orders.total_amount"]
# │
# └── usage:
# ├── query_count_30d: 1250
# ├── unique_users_30d: 45
# └── last_queried: "2024-01-15T14:30:00Z"
API Endpoints
Certification
# Submit for certification
POST /api/v1/governance/certification/submit
{
"artifact_type": "measure",
"artifact_id": "Orders.revenue",
"notes": "Validated against finance system"
}
# Certify artifact
POST /api/v1/governance/certification/certify
{
"artifact_type": "measure",
"artifact_id": "Orders.revenue",
"notes": "Approved",
"valid_until": "2025-01-15"
}
# Get certification status
GET /api/v1/governance/certification/status?
artifact_type=measure&
artifact_id=Orders.revenue
# List artifacts by status
GET /api/v1/governance/certification/list?
status=certified&
artifact_type=measure
Glossary
# Create term
POST /api/v1/governance/glossary/terms
{
"name": "Revenue",
"definition": "Total income...",
"synonyms": ["Sales", "Income"],
"category": "Finance"
}
# Search glossary
GET /api/v1/governance/glossary/search?
query=revenue&
category=Finance
# Link term to artifact
POST /api/v1/governance/glossary/terms/<term_id>/links
{
"artifact_type": "measure",
"artifact_id": "Orders.revenue",
"relationship": "defines"
}
# Get term
GET /api/v1/governance/glossary/terms/<term_id>
# List all terms
GET /api/v1/governance/glossary/terms?
category=Finance&
status=approved
Ownership
# Assign owner
POST /api/v1/governance/ownership/assign
{
"artifact_type": "cube",
"artifact_id": "Orders",
"owner_type": "user",
"owner_id": "user_123",
"role": "data_owner"
}
# Get ownership
GET /api/v1/governance/ownership?
artifact_type=cube&
artifact_id=Orders
# List artifacts by owner
GET /api/v1/governance/ownership/by-owner?
owner_id=user_123
Artifact Metadata
# Update metadata
PATCH /api/v1/governance/artifacts/<type>/<id>/metadata
{
"description": "...",
"tags": ["finance", "core"],
"data_classification": "internal"
}
# Get artifact details
GET /api/v1/governance/artifacts/<type>/<id>
# Search artifacts
GET /api/v1/governance/artifacts/search?
query=revenue&
tags=finance&
certification_status=certified
Data Catalog
Browse and discover data assets:
Catalog View
# Browse catalog
catalog = await service.browse_catalog(
artifact_types=["cube", "metric"],
filters={
"certification_status": "certified",
"tags": ["core-metric"],
"owner_team": "analytics_team"
},
sort_by="popularity", # popularity, recent, alphabetical
limit=50
)
# CatalogResult:
# ├── total_count: 125
# ├── items: [
# │ {
# │ "type": "cube",
# │ "id": "Orders",
# │ "name": "Orders",
# │ "description": "All customer orders...",
# │ "certification_status": "certified",
# │ "owner": "finance_team",
# │ "tags": ["finance", "core"],
# │ "usage_score": 95,
# │ "measures_count": 12,
# │ "dimensions_count": 8
# │ },
# │ ...
# │ ]
# └── facets: {
# "certification_status": {"certified": 85, "draft": 30, "deprecated": 10},
# "owner_team": {"analytics": 45, "finance": 50, "marketing": 30},
# "tags": {"finance": 60, "core": 40, "marketing": 35}
# }
Search & Discovery
# Full-text search
results = await service.search_artifacts(
query="customer lifetime value",
artifact_types=["measure", "metric"],
filters={
"certification_status": ["certified", "pending_review"]
},
include_glossary=True
)
# SearchResults:
# ├── artifacts: [
# │ {
# │ "type": "metric",
# │ "id": "customer_ltv",
# │ "name": "Customer Lifetime Value",
# │ "relevance_score": 0.95,
# │ "highlights": ["...customer lifetime value calculation..."]
# │ }
# │ ]
# ├── glossary_matches: [
# │ {
# │ "term": "Lifetime Value (LTV)",
# │ "definition": "Predicted total revenue from a customer..."
# │ }
# │ ]
# └── suggested_terms: ["CLV", "Customer Value", "LTV"]
Audit & Compliance
Access Audit
# Query audit log
audit_log = await service.query_audit_log(
artifact_type="cube",
artifact_id="Orders",
event_types=["query", "export"],
date_range=("2024-01-01", "2024-01-31"),
user_id=None # All users
)
# AuditLog:
# ├── total_events: 1250
# └── events: [
# {
# "timestamp": "2024-01-15T14:30:00Z",
# "event_type": "query",
# "user_id": "analyst_123",
# "artifact_type": "cube",
# "artifact_id": "Orders",
# "details": {
# "measures": ["revenue", "count"],
# "dimensions": ["region"],
# "row_count": 15
# },
# "ip_address": "10.0.1.50"
# },
# ...
# ]
Change History
# Get change history
history = await service.get_change_history(
artifact_type="measure",
artifact_id="Orders.revenue",
limit=20
)
# ChangeHistory:
# └── changes: [
# {
# "timestamp": "2024-01-15T10:00:00Z",
# "change_type": "definition_updated",
# "changed_by": "analyst_123",
# "before": {"sql": "SUM(price)"},
# "after": {"sql": "SUM(price * quantity)"},
# "reason": "Include quantity in revenue calculation"
# },
# {
# "timestamp": "2024-01-10T09:00:00Z",
# "change_type": "created",
# "changed_by": "analyst_123"
# }
# ]
Compliance Reports
# Generate compliance report
report = await service.generate_compliance_report(
report_type="data_inventory",
include_sections=[
"artifact_summary",
"certification_status",
"ownership_coverage",
"access_patterns",
"data_classification"
],
format="pdf"
)
# Report includes:
# - Total artifacts by type and status
# - Certification coverage percentage
# - Ownership assignment coverage
# - Access frequency by user/team
# - Data classification distribution
# - PII flag summary
Best Practices
Certification
- Define clear criteria - Document what "certified" means
- Regular reviews - Re-certify periodically (e.g., annually)
- Track expiration - Don't let certifications lapse
- Deprecation process - Clear timeline and alternatives
Glossary
- Start with core terms - Focus on high-impact definitions
- Involve stakeholders - Get business input on definitions
- Link to artifacts - Connect terms to measures/dimensions
- Review regularly - Keep definitions current
Ownership
- Every asset needs an owner - No orphaned data
- Clear escalation - Owner → Steward → Technical Owner
- Document responsibilities - What each role does
- Regular review - Update when people change roles
Compliance
- Enable audit logging - Track all access
- Regular reports - Monthly compliance reviews
- Data classification - Tag all sensitive data
- Retention policies - Define and enforce
Configuration
# governance configuration
governance:
certification:
enabled: true
default_validity_days: 365
require_review_notes: true
auto_deprecate_after_days: 90 # After expiration
glossary:
enabled: true
require_approval: true
sync_from_external: null # Optional external glossary URL
ownership:
require_owner: true
require_steward: false
allow_team_ownership: true
audit:
enabled: true
retention_days: 365
log_queries: true
log_exports: true
log_changes: true
compliance:
data_classification_required: true
pii_flagging_required: true
report_schedule: "0 0 1 * *" # Monthly
Next Steps
- Security Overview - Access control and data protection
- Audit Logging - Detailed audit configuration
- Metric Certification - Business perspective