Skip to main content

Column Masking

For Data Analysts

Column masking protects sensitive data by replacing actual values with masked alternatives based on user permissions. Users without proper access see masked data while authorized users see the original values.

Overview

Column masking in Olytix Core:

  • Protects PII, financial data, and other sensitive information
  • Applies transparently at query time
  • Supports multiple masking functions for different data types
  • Works alongside RLS for comprehensive data protection

How Masking Works

┌─────────────────────────────────────────────────────────────────┐
│ User Query │
│ SELECT customer_name, email, phone, revenue FROM customers │
└───────────────────────────┬─────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Masking Policy Engine │
│ User: analyst@company.com │
│ Roles: [analyst] (not in exempt_roles: [admin, compliance]) │
│ │
│ Policies: │
│ - email: partial (j***@company.com) │
│ - phone: partial (***-***-1234) │
│ - revenue: numeric_range (1000-2000) │
└───────────────────────────┬─────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Query Result │
│ customer_name | email | phone | revenue │
│ John Smith | j***@company.com | ***-***-1234 | 1000-2000 │
│ Jane Doe | j***@example.org | ***-***-5678 | 2000-3000 │
└─────────────────────────────────────────────────────────────────┘

Defining Masking Policies

YAML Configuration

# security/masking/email_masking.yml
name: email_masking
description: Mask email addresses for non-privileged users
cube: customers
column: email
enabled: true
priority: 10

# Masking configuration
mask_function: partial
mask_parameters:
pattern: email # j***@domain.com

# Users exempt from masking
exempt_roles:
- admin
- compliance
- data_steward

exempt_groups:
- pii_access

Masking Functions

Olytix Core provides multiple masking functions:

FunctionDescriptionExample InputExample Output
nullReplace with NULLjohn@email.comNULL
redactReplace with redacted textjohn@email.com***REDACTED***
hashSHA256 hashjohn@email.coma1b2c3d4...
partialShow partial valuejohn@email.comj***@email.com
truncateTruncate to N charsjohn@email.comjohn@emai
first_nShow first N charsjohn@email.comjoh***
last_nShow last N charsjohn@email.com***com
constantReplace with constantjohn@email.com[MASKED]
randomReplace with randomjohn@email.comf7a3b2c1...
date_truncateTruncate date precision2024-03-152024-03-01
numeric_rangeShow value as range15231500-2000

Masking Function Examples

Email Masking

name: email_partial
cube: customers
column: email
mask_function: partial
mask_parameters:
pattern: email # j***@domain.com
exempt_roles:
- admin

Result: john.smith@company.com becomes j***@company.com

Phone Number Masking

name: phone_partial
cube: customers
column: phone
mask_function: partial
mask_parameters:
pattern: phone # ***-***-1234
exempt_roles:
- support

Result: 555-123-4567 becomes ***-***-4567

Credit Card Masking

name: credit_card_mask
cube: payments
column: card_number
mask_function: partial
mask_parameters:
pattern: credit_card # ****-****-****-1234
exempt_roles:
- payment_admin

Result: 4532-1234-5678-9012 becomes ****-****-****-9012

SSN Masking

name: ssn_mask
cube: employees
column: ssn
mask_function: partial
mask_parameters:
pattern: ssn # ***-**-1234
exempt_roles:
- hr_admin

Result: 123-45-6789 becomes ***-**-6789

Full Redaction

name: salary_redact
cube: employees
column: salary
mask_function: redact
mask_parameters:
text: "[CONFIDENTIAL]"
exempt_roles:
- hr_manager
- executive

Result: 150000 becomes [CONFIDENTIAL]

Numeric Range

name: salary_range
cube: employees
column: salary
mask_function: numeric_range
mask_parameters:
step: 10000 # Show as ranges of 10K
exempt_roles:
- hr_manager

Result: 147500 becomes 140000-150000

Date Truncation

name: birthdate_truncate
cube: customers
column: birth_date
mask_function: date_truncate
mask_parameters:
granularity: year # Show only year
exempt_roles:
- compliance

Result: 1985-03-15 becomes 1985-01-01

Hashing

name: customer_id_hash
cube: analytics
column: customer_id
mask_function: hash
exempt_roles:
- data_engineer

Result: CUST12345 becomes a1b2c3d4e5f6...

First N Characters

name: name_first_n
cube: customers
column: full_name
mask_function: first_n
mask_parameters:
n: 2
mask_rest: "***"
exempt_roles:
- support

Result: John Smith becomes Jo***

Last N Characters

name: account_last_n
cube: accounts
column: account_number
mask_function: last_n
mask_parameters:
n: 4
mask_prefix: "****"
exempt_roles:
- finance

Result: 1234567890 becomes ****7890

Wildcard Column Matching

Apply masking to multiple columns:

name: mask_all_emails
cube: "*" # All cubes
column: "*email*" # Any column containing 'email'
mask_function: partial
mask_parameters:
pattern: email
exempt_roles:
- admin
name: mask_pii_columns
cube: customers
column: "pii_*" # Columns starting with 'pii_'
mask_function: redact
exempt_roles:
- compliance

Cube-Level Configuration

Configure masking at the cube level:

# cubes/customers.yml
name: customers
sql_table: public.customers

dimensions:
- name: email
type: string
sql: email
meta:
sensitive: true
masking_policy: email_masking

- name: phone
type: string
sql: phone
meta:
sensitive: true
masking_policy: phone_masking

- name: ssn
type: string
sql: ssn
meta:
sensitive: true
pii_category: "ssn"

security:
masking:
enabled: true
policies:
- email_masking
- phone_masking
- ssn_masking

API Configuration

Create Policy

curl -X POST http://localhost:8000/api/v1/security/masking-policies \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "email_masking",
"description": "Mask customer emails",
"cube": "customers",
"column": "email",
"mask_function": "partial",
"mask_parameters": {
"pattern": "email"
},
"exempt_roles": ["admin", "support"],
"enabled": true
}'

List Policies

curl http://localhost:8000/api/v1/security/masking-policies \
-H "Authorization: Bearer $ADMIN_TOKEN"

Check Column Masking

curl http://localhost:8000/api/v1/security/masking-policies/check \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"cube": "customers",
"columns": ["email", "phone", "name"]
}'

Response:

{
"columns": {
"email": {
"is_masked": true,
"policy_name": "email_masking",
"mask_function": "partial"
},
"phone": {
"is_masked": true,
"policy_name": "phone_masking",
"mask_function": "partial"
},
"name": {
"is_masked": false,
"policy_name": null,
"mask_function": null
}
}
}

Query Response with Masking

When masking is applied, the response includes metadata:

{
"data": [
{
"customer_name": "John Smith",
"email": "j***@company.com",
"phone": "***-***-4567",
"revenue": "10000-20000"
}
],
"security": {
"masking": {
"policies_applied": ["email_masking", "phone_masking", "revenue_range"],
"columns_masked": ["email", "phone", "revenue"],
"columns_unmasked": ["customer_name"],
"evaluation_time_ms": 0.8
}
}
}

Testing Masking Policies

Dry Run

curl -X POST http://localhost:8000/api/v1/security/masking-policies/evaluate \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"cube": "customers",
"columns": ["email", "phone", "name", "ssn"],
"user_context": {
"user_id": "test_user",
"roles": ["analyst"]
}
}'

CLI Testing

olytix-core security test-masking \
--cube customers \
--columns email,phone,ssn \
--roles analyst

Combining with RLS

Masking works together with RLS:

# RLS Policy - filter rows
name: regional_access
cube: customers
filter_sql: "region = $user.region"
roles:
- analyst

# Masking Policy - mask columns
name: email_masking
cube: customers
column: email
mask_function: partial
exempt_roles:
- admin

Result:

  1. RLS filters rows to user's region
  2. Masking hides email details in filtered results
  3. User sees only authorized rows with masked sensitive columns

SQL Dialect Support

Masking functions generate dialect-specific SQL:

PostgreSQL

-- partial email
CONCAT(LEFT(email, 1), '***', SUBSTRING(email, POSITION('@' IN email)))

-- hash
ENCODE(SHA256((customer_id)::BYTEA), 'hex')

-- date_truncate
DATE_TRUNC('month', birth_date)

MySQL

-- partial email
CONCAT(LEFT(email, 1), '***', SUBSTRING(email, LOCATE('@', email)))

-- hash
SHA2(customer_id, 256)

-- date_truncate
DATE_FORMAT(birth_date, '%Y-%m-01')

Common Patterns

PII Protection

# Comprehensive PII masking
policies:
- name: mask_emails
column: "*email*"
mask_function: partial
mask_parameters:
pattern: email

- name: mask_phones
column: "*phone*"
mask_function: partial
mask_parameters:
pattern: phone

- name: mask_ssn
column: "*ssn*"
mask_function: partial
mask_parameters:
pattern: ssn

- name: mask_addresses
column: "*address*"
mask_function: redact

Financial Data Protection

policies:
- name: mask_credit_cards
column: "*card*"
mask_function: partial
mask_parameters:
pattern: credit_card

- name: mask_account_numbers
column: "*account*"
mask_function: last_n
mask_parameters:
n: 4

- name: mask_amounts
column: "*amount*"
mask_function: numeric_range
mask_parameters:
step: 1000

Role-Based Exemptions

# Different roles see different levels of detail
name: salary_tiered
cube: employees
column: salary

# Base policy - show ranges
mask_function: numeric_range
mask_parameters:
step: 25000

# Exempt roles see actual values
exempt_roles:
- hr_admin
- executive
- payroll

Best Practices

  1. Mask by default - Start with masking enabled, exempt specific roles
  2. Use appropriate functions - Choose masking that preserves data utility
  3. Document exemptions - Clearly document why roles are exempt
  4. Test thoroughly - Verify masking works as expected
  5. Audit access - Log when exempt users access sensitive data
  6. Review regularly - Periodically audit masking policies

Compliance Considerations

RegulationRelevant Masking Features
GDPREmail/phone masking, right to be forgotten
HIPAAPatient identifier masking, SSN protection
PCI DSSCredit card masking (show last 4 only)
CCPAConsumer data masking, data minimization

Troubleshooting

Data Still Visible

  1. Check if user has exempt role
  2. Verify policy is enabled
  3. Confirm column name matches pattern

Wrong Masking Applied

  1. Check policy priority
  2. Verify column pattern matching
  3. Review mask_parameters

Performance Issues

  1. Limit wildcard column patterns
  2. Cache policy evaluations
  3. Consider pre-masking in ETL for large datasets

Next Steps