PostgreSQL Integration
PostgreSQL is a powerful, open-source relational database that works excellently as a data warehouse for small to medium-sized datasets. Olytix Core uses the high-performance asyncpg driver for optimal query execution.
Prerequisites
- PostgreSQL 12 or later
- Network access from Olytix Core to your PostgreSQL instance
- Database user with read access to source tables
Installation
Install the PostgreSQL connector:
pip install olytix-core[postgresql]
# Or install asyncpg directly
pip install asyncpg
Configuration
Basic Configuration
Add the warehouse configuration to your olytix-core_project.yml:
name: my_analytics
version: 1.0.0
warehouse:
type: postgresql
host: localhost
port: 5432
database: analytics
user: ${OLYTIX_DB_USER}
password: ${OLYTIX_DB_PASSWORD}
schema: public
Environment Variables
Set your credentials using environment variables:
export OLYTIX_DB_USER=analytics_user
export OLYTIX_DB_PASSWORD=your_secure_password
Or use a .env file:
# .env
OLYTIX_DB_USER=analytics_user
OLYTIX_DB_PASSWORD=your_secure_password
Complete Configuration Options
warehouse:
type: postgresql
# Connection settings
host: db.example.com
port: 5432
database: analytics
schema: public
user: ${OLYTIX_DB_USER}
password: ${OLYTIX_DB_PASSWORD}
# Connection pool settings
min_connections: 2
max_connections: 10
connection_timeout: 30 # seconds
query_timeout: 300 # seconds (5 minutes)
# SSL settings
ssl_enabled: true
ssl_ca_cert: /path/to/ca-certificate.pem
ssl_client_cert: /path/to/client-cert.pem
ssl_client_key: /path/to/client-key.pem
Connection Pool Settings
Olytix Core uses connection pooling to efficiently manage database connections:
| Setting | Default | Description |
|---|---|---|
min_connections | 1 | Minimum connections to maintain |
max_connections | 10 | Maximum connections allowed |
connection_timeout | 30 | Seconds to wait for a connection |
query_timeout | 300 | Maximum query execution time |
Sizing Recommendations
| Use Case | min_connections | max_connections |
|---|---|---|
| Development | 1 | 5 |
| Small team (< 10 users) | 2 | 10 |
| Medium team (10-50 users) | 5 | 25 |
| Production (> 50 users) | 10 | 50 |
SSL/TLS Configuration
For production deployments, enable SSL encryption:
Self-Signed Certificates
warehouse:
type: postgresql
host: db.example.com
ssl_enabled: true
ssl_ca_cert: /etc/olytix-core/certs/ca.pem
Client Certificate Authentication
warehouse:
type: postgresql
host: db.example.com
ssl_enabled: true
ssl_ca_cert: /etc/olytix-core/certs/ca.pem
ssl_client_cert: /etc/olytix-core/certs/client.pem
ssl_client_key: /etc/olytix-core/certs/client.key
AWS RDS SSL
warehouse:
type: postgresql
host: mydb.xxxxx.us-east-1.rds.amazonaws.com
ssl_enabled: true
ssl_ca_cert: /etc/olytix-core/certs/rds-ca-2019-root.pem
Download the RDS CA certificate from AWS:
curl -o /etc/olytix-core/certs/rds-ca-2019-root.pem \
https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem
Testing the Connection
Verify your connection is working:
olytix-core test-connection
Expected output:
Testing connection to postgresql://analytics_user@localhost:5432/analytics
✓ Connection successful
✓ Schema 'public' accessible
✓ Found 15 tables
Using Python:
from olytix-core.engine.connectors.postgresql import PostgreSQLConnector
from olytix-core.engine.connectors.base import ConnectionConfig, ConnectorType
config = ConnectionConfig(
type=ConnectorType.POSTGRESQL,
host="localhost",
port=5432,
database="analytics",
username="analytics_user",
password="your_password",
schema="public",
)
async def test():
connector = PostgreSQLConnector(config)
async with connector:
result = await connector.execute("SELECT 1 as test")
print(f"Connection successful: {result.data}")
tables = await connector.get_tables()
print(f"Available tables: {tables}")
Defining Sources
Once connected, define your data sources:
# sources/raw.yml
version: 2
sources:
- name: raw
database: analytics
schema: public
tables:
- name: orders
description: Raw e-commerce orders
columns:
- name: id
description: Order ID
- name: customer_id
description: Customer reference
- name: total_amount
description: Order total in USD
- name: created_at
description: Order timestamp
- name: customers
description: Customer master data
columns:
- name: id
description: Customer ID
- name: email
description: Customer email
- name: created_at
description: Registration date
Query Execution
Olytix Core executes queries against PostgreSQL using optimized patterns:
Standard Execution
result = await connector.execute(
"SELECT region, SUM(amount) FROM orders GROUP BY region"
)
print(f"Rows: {result.row_count}, Time: {result.execution_time_ms}ms")
Streaming Large Results
For large datasets, use streaming to avoid memory issues:
async for batch in connector.execute_streaming(
"SELECT * FROM large_table",
batch_size=10000
):
process_batch(batch)
Batch Inserts
For pre-aggregation tables:
data = [
{"date": "2024-01-01", "revenue": 50000},
{"date": "2024-01-02", "revenue": 52000},
]
rows = await connector.insert_batch("daily_revenue", data)
Performance Optimization
Indexing Recommendations
Create indexes on commonly filtered columns:
-- Indexes for typical analytical queries
CREATE INDEX idx_orders_created_at ON orders(created_at);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);
-- Composite index for common query patterns
CREATE INDEX idx_orders_date_status ON orders(created_at, status);
Partitioning
For large tables, use PostgreSQL partitioning:
-- Create partitioned table
CREATE TABLE orders (
id SERIAL,
customer_id INTEGER,
total_amount DECIMAL(10,2),
created_at TIMESTAMP
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
Materialized Views
Use PostgreSQL materialized views for pre-aggregation:
CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT
DATE_TRUNC('day', created_at) as date,
SUM(total_amount) as revenue,
COUNT(*) as order_count
FROM orders
GROUP BY 1;
-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue;
Troubleshooting
Connection Refused
Error: Connection refused to localhost:5432
Solutions:
- Verify PostgreSQL is running:
pg_isready -h localhost -p 5432 - Check
pg_hba.confallows connections from Olytix Core host - Verify firewall rules allow port 5432
Authentication Failed
Error: password authentication failed for user "analytics_user"
Solutions:
- Verify credentials in environment variables
- Check user exists:
SELECT usename FROM pg_user; - Verify
pg_hba.confauthentication method
SSL Required
Error: SSL connection is required
Solutions:
- Enable SSL in configuration:
ssl_enabled: true - Provide CA certificate path
- Or allow non-SSL in PostgreSQL (not recommended for production)
Timeout Errors
Error: Query exceeded timeout of 300 seconds
Solutions:
- Increase
query_timeoutin configuration - Optimize the underlying query
- Add appropriate indexes
- Consider pre-aggregations for complex queries
Cloud Provider Configurations
AWS RDS
warehouse:
type: postgresql
host: mydb.xxxxx.us-east-1.rds.amazonaws.com
port: 5432
database: analytics
user: ${OLYTIX_DB_USER}
password: ${OLYTIX_DB_PASSWORD}
ssl_enabled: true
ssl_ca_cert: /etc/olytix-core/certs/rds-ca-bundle.pem
Google Cloud SQL
warehouse:
type: postgresql
host: /cloudsql/project:region:instance # Unix socket
database: analytics
user: ${OLYTIX_DB_USER}
password: ${OLYTIX_DB_PASSWORD}
Azure Database for PostgreSQL
warehouse:
type: postgresql
host: myserver.postgres.database.azure.com
port: 5432
database: analytics
user: myuser@myserver
password: ${OLYTIX_DB_PASSWORD}
ssl_enabled: true