Skip to main content

PostgreSQL Integration

For Data Analysts

PostgreSQL is a powerful, open-source relational database that works excellently as a data warehouse for small to medium-sized datasets. Olytix Core uses the high-performance asyncpg driver for optimal query execution.

Prerequisites

  • PostgreSQL 12 or later
  • Network access from Olytix Core to your PostgreSQL instance
  • Database user with read access to source tables

Installation

Install the PostgreSQL connector:

pip install olytix-core[postgresql]

# Or install asyncpg directly
pip install asyncpg

Configuration

Basic Configuration

Add the warehouse configuration to your olytix-core_project.yml:

name: my_analytics
version: 1.0.0

warehouse:
type: postgresql
host: localhost
port: 5432
database: analytics
user: ${OLYTIX_DB_USER}
password: ${OLYTIX_DB_PASSWORD}
schema: public

Environment Variables

Set your credentials using environment variables:

export OLYTIX_DB_USER=analytics_user
export OLYTIX_DB_PASSWORD=your_secure_password

Or use a .env file:

# .env
OLYTIX_DB_USER=analytics_user
OLYTIX_DB_PASSWORD=your_secure_password

Complete Configuration Options

warehouse:
type: postgresql

# Connection settings
host: db.example.com
port: 5432
database: analytics
schema: public
user: ${OLYTIX_DB_USER}
password: ${OLYTIX_DB_PASSWORD}

# Connection pool settings
min_connections: 2
max_connections: 10
connection_timeout: 30 # seconds
query_timeout: 300 # seconds (5 minutes)

# SSL settings
ssl_enabled: true
ssl_ca_cert: /path/to/ca-certificate.pem
ssl_client_cert: /path/to/client-cert.pem
ssl_client_key: /path/to/client-key.pem

Connection Pool Settings

Olytix Core uses connection pooling to efficiently manage database connections:

SettingDefaultDescription
min_connections1Minimum connections to maintain
max_connections10Maximum connections allowed
connection_timeout30Seconds to wait for a connection
query_timeout300Maximum query execution time

Sizing Recommendations

Use Casemin_connectionsmax_connections
Development15
Small team (< 10 users)210
Medium team (10-50 users)525
Production (> 50 users)1050

SSL/TLS Configuration

For production deployments, enable SSL encryption:

Self-Signed Certificates

warehouse:
type: postgresql
host: db.example.com
ssl_enabled: true
ssl_ca_cert: /etc/olytix-core/certs/ca.pem

Client Certificate Authentication

warehouse:
type: postgresql
host: db.example.com
ssl_enabled: true
ssl_ca_cert: /etc/olytix-core/certs/ca.pem
ssl_client_cert: /etc/olytix-core/certs/client.pem
ssl_client_key: /etc/olytix-core/certs/client.key

AWS RDS SSL

warehouse:
type: postgresql
host: mydb.xxxxx.us-east-1.rds.amazonaws.com
ssl_enabled: true
ssl_ca_cert: /etc/olytix-core/certs/rds-ca-2019-root.pem

Download the RDS CA certificate from AWS:

curl -o /etc/olytix-core/certs/rds-ca-2019-root.pem \
https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem

Testing the Connection

Verify your connection is working:

olytix-core test-connection

Expected output:

Testing connection to postgresql://analytics_user@localhost:5432/analytics
✓ Connection successful
✓ Schema 'public' accessible
✓ Found 15 tables

Using Python:

from olytix-core.engine.connectors.postgresql import PostgreSQLConnector
from olytix-core.engine.connectors.base import ConnectionConfig, ConnectorType

config = ConnectionConfig(
type=ConnectorType.POSTGRESQL,
host="localhost",
port=5432,
database="analytics",
username="analytics_user",
password="your_password",
schema="public",
)

async def test():
connector = PostgreSQLConnector(config)
async with connector:
result = await connector.execute("SELECT 1 as test")
print(f"Connection successful: {result.data}")

tables = await connector.get_tables()
print(f"Available tables: {tables}")

Defining Sources

Once connected, define your data sources:

# sources/raw.yml
version: 2

sources:
- name: raw
database: analytics
schema: public
tables:
- name: orders
description: Raw e-commerce orders
columns:
- name: id
description: Order ID
- name: customer_id
description: Customer reference
- name: total_amount
description: Order total in USD
- name: created_at
description: Order timestamp

- name: customers
description: Customer master data
columns:
- name: id
description: Customer ID
- name: email
description: Customer email
- name: created_at
description: Registration date

Query Execution

Olytix Core executes queries against PostgreSQL using optimized patterns:

Standard Execution

result = await connector.execute(
"SELECT region, SUM(amount) FROM orders GROUP BY region"
)
print(f"Rows: {result.row_count}, Time: {result.execution_time_ms}ms")

Streaming Large Results

For large datasets, use streaming to avoid memory issues:

async for batch in connector.execute_streaming(
"SELECT * FROM large_table",
batch_size=10000
):
process_batch(batch)

Batch Inserts

For pre-aggregation tables:

data = [
{"date": "2024-01-01", "revenue": 50000},
{"date": "2024-01-02", "revenue": 52000},
]
rows = await connector.insert_batch("daily_revenue", data)

Performance Optimization

Indexing Recommendations

Create indexes on commonly filtered columns:

-- Indexes for typical analytical queries
CREATE INDEX idx_orders_created_at ON orders(created_at);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);

-- Composite index for common query patterns
CREATE INDEX idx_orders_date_status ON orders(created_at, status);

Partitioning

For large tables, use PostgreSQL partitioning:

-- Create partitioned table
CREATE TABLE orders (
id SERIAL,
customer_id INTEGER,
total_amount DECIMAL(10,2),
created_at TIMESTAMP
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

Materialized Views

Use PostgreSQL materialized views for pre-aggregation:

CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT
DATE_TRUNC('day', created_at) as date,
SUM(total_amount) as revenue,
COUNT(*) as order_count
FROM orders
GROUP BY 1;

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue;

Troubleshooting

Connection Refused

Error: Connection refused to localhost:5432

Solutions:

  1. Verify PostgreSQL is running: pg_isready -h localhost -p 5432
  2. Check pg_hba.conf allows connections from Olytix Core host
  3. Verify firewall rules allow port 5432

Authentication Failed

Error: password authentication failed for user "analytics_user"

Solutions:

  1. Verify credentials in environment variables
  2. Check user exists: SELECT usename FROM pg_user;
  3. Verify pg_hba.conf authentication method

SSL Required

Error: SSL connection is required

Solutions:

  1. Enable SSL in configuration: ssl_enabled: true
  2. Provide CA certificate path
  3. Or allow non-SSL in PostgreSQL (not recommended for production)

Timeout Errors

Error: Query exceeded timeout of 300 seconds

Solutions:

  1. Increase query_timeout in configuration
  2. Optimize the underlying query
  3. Add appropriate indexes
  4. Consider pre-aggregations for complex queries

Cloud Provider Configurations

AWS RDS

warehouse:
type: postgresql
host: mydb.xxxxx.us-east-1.rds.amazonaws.com
port: 5432
database: analytics
user: ${OLYTIX_DB_USER}
password: ${OLYTIX_DB_PASSWORD}
ssl_enabled: true
ssl_ca_cert: /etc/olytix-core/certs/rds-ca-bundle.pem

Google Cloud SQL

warehouse:
type: postgresql
host: /cloudsql/project:region:instance # Unix socket
database: analytics
user: ${OLYTIX_DB_USER}
password: ${OLYTIX_DB_PASSWORD}

Azure Database for PostgreSQL

warehouse:
type: postgresql
host: myserver.postgres.database.azure.com
port: 5432
database: analytics
user: myuser@myserver
password: ${OLYTIX_DB_PASSWORD}
ssl_enabled: true

Next Steps