Python Psycog2: How to Resolve "psycopg2 OperationalError: SSL SYSCALL Error: EOF Detected" in Python
The error psycopg2.OperationalError: SSL SYSCALL error: EOF detected is one of the more cryptic PostgreSQL connection errors you'll encounter in Python. It means the SSL connection to your database was abruptly terminated: the server closed the connection without sending a proper shutdown signal.
In this guide, we'll explain what causes this error, how to diagnose the root cause, and walk through practical solutions to fix and prevent it.
What Does "SSL SYSCALL Error: EOF Detected" Mean?
Let's break down the error message:
- SSL SYSCALL error: An error occurred during a low-level SSL system call (reading from or writing to the encrypted connection).
- EOF detected: "End of File" was received, meaning the server closed the connection without the normal SSL shutdown handshake.
In simpler terms: your application tried to communicate with PostgreSQL over an encrypted connection, but the connection was already dead. This is different from a connection refusal: the connection was once active but was terminated unexpectedly.
The full error typically appears as:
psycopg2.OperationalError: SSL SYSCALL error: EOF detected
The connection to the server was lost. Attempting reset: Failed.
Common Causes
| Cause | Description |
|---|---|
| PostgreSQL server restart | The server was restarted while your connection was active |
| Idle connection timeout | The server, proxy, or cloud provider killed the idle connection |
| Server out of memory (OOM) | PostgreSQL was killed by the OS due to memory pressure |
| Network interruption | Network instability, VPN drops, or firewall timeout |
| PgBouncer/proxy issues | Connection pooler terminated the connection |
| Long-running queries | Query exceeded statement_timeout or was cancelled server-side |
| Max connections exceeded | Server hit its connection limit and terminated old connections |
| SSL certificate issues | Certificate expired or rotated while connection was active |
How to Diagnose the Cause
Before applying fixes, determine why the connection was closed. Check these sources:
Check PostgreSQL Server Logs
# Find the log file location
sudo -u postgres psql -c "SHOW log_directory;"
sudo -u postgres psql -c "SHOW log_filename;"
# Tail the log file
tail -f /var/log/postgresql/postgresql-15-main.log
Look for messages like:
LOG: could not receive data from client: Connection reset by peer
LOG: unexpected EOF on client connection with an open transaction
FATAL: terminating connection due to administrator command
LOG: server process (PID 12345) was terminated by signal 9: Killed
The last one (signal 9: Killed) indicates an OOM kill: the operating system terminated PostgreSQL due to memory pressure.
Check System Logs for OOM Events
# Linux
dmesg | grep -i "oom\|killed process"
journalctl -k | grep -i "oom"
Check Network Stability
# Test basic connectivity
ping -c 10 your-database-host
# Test the PostgreSQL port specifically
nc -zv your-database-host 5432
Solutions
Solution 1: Implement Robust Retry Logic
Since this error is almost always transient, the most practical first step is to implement retry logic with proper error handling:
import psycopg2
import time
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def create_connection(dsn):
"""Create a new database connection with keepalives."""
return psycopg2.connect(
dsn,
keepalives=1,
keepalives_idle=30,
keepalives_interval=10,
keepalives_count=5,
connect_timeout=10,
)
def execute_with_retry(dsn, query, params=None, max_retries=3):
"""Execute a query with automatic reconnection on EOF errors."""
for attempt in range(1, max_retries + 1):
conn = None
try:
conn = create_connection(dsn)
cursor = conn.cursor()
cursor.execute(query, params)
results = cursor.fetchall()
conn.commit()
cursor.close()
conn.close()
return results
except psycopg2.OperationalError as e:
error_msg = str(e).lower()
logger.warning(f"Attempt {attempt}/{max_retries} failed: {e}")
if conn:
try:
conn.close()
except Exception:
pass
# Only retry on connection-related errors
if "eof detected" in error_msg or "connection" in error_msg:
if attempt < max_retries:
delay = 2 ** attempt # Exponential backoff
logger.info(f"Retrying in {delay} seconds...")
time.sleep(delay)
continue
raise # Re-raise if it's not a retryable error or all retries exhausted
raise psycopg2.OperationalError("All retry attempts failed.")
# Usage
dsn = "dbname=mydb user=postgres password=secret host=localhost sslmode=require"
try:
results = execute_with_retry(dsn, "SELECT version();")
print(f"Connected: {results[0][0]}")
except psycopg2.OperationalError as e:
print(f"Failed after all retries: {e}")
Output (successful):
Connected: PostgreSQL 15.4 on x86_64-pc-linux-gnu...
Solution 2: Configure TCP Keepalives
Without TCP keepalives, your application won't know a connection is dead until it tries to use it. Keepalives send periodic "are you still there?" probes to detect dead connections early.
Client-side (in Python):
import psycopg2
conn = psycopg2.connect(
dbname="mydb",
user="postgres",
password="secret",
host="your-db-host",
sslmode="require",
keepalives=1, # Enable TCP keepalives
keepalives_idle=30, # Send first probe after 30 seconds of inactivity
keepalives_interval=10, # Send probes every 10 seconds
keepalives_count=5, # Consider connection dead after 5 failed probes
connect_timeout=10, # Timeout for initial connection
)
Server-side (postgresql.conf):
tcp_keepalives_idle = 30
tcp_keepalives_interval = 10
tcp_keepalives_count = 5
sudo systemctl restart postgresql
With these settings, a dead connection will be detected within approximately 80 seconds (30 + 10 × 5) instead of waiting indefinitely. Adjust values based on your network environment: cloud environments with load balancers often need shorter intervals.
Solution 3: Use Connection Pooling
Connection pools manage a set of reusable connections and automatically replace broken ones. This is the recommended approach for production applications.
Using psycopg2.pool:
import psycopg2
from psycopg2 import pool
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Create a thread-safe connection pool
db_pool = pool.ThreadedConnectionPool(
minconn=2,
maxconn=10,
dsn="dbname=mydb user=postgres password=secret host=localhost sslmode=require",
keepalives=1,
keepalives_idle=30,
keepalives_interval=10,
keepalives_count=5,
)
def safe_query(query, params=None):
"""Execute a query using the pool, handling broken connections."""
conn = db_pool.getconn()
try:
# Test the connection first
cursor = conn.cursor()
cursor.execute("SELECT 1;")
cursor.close()
# Execute the actual query
cursor = conn.cursor()
cursor.execute(query, params)
results = cursor.fetchall()
conn.commit()
cursor.close()
return results
except psycopg2.OperationalError as e:
logger.warning(f"Connection broken: {e}. Getting a fresh connection.")
db_pool.putconn(conn, close=True) # Discard the broken connection
# Get a fresh connection and retry
conn = db_pool.getconn()
cursor = conn.cursor()
cursor.execute(query, params)
results = cursor.fetchall()
conn.commit()
cursor.close()
return results
finally:
if conn:
db_pool.putconn(conn)
# Usage
results = safe_query("SELECT current_timestamp;")
print(f"Server time: {results[0][0]}")
For production applications, consider PgBouncer: a lightweight, dedicated PostgreSQL connection pooler that sits between your application and the database. It handles connection recycling, health checking, and idle timeout management far better than application-level pools.
Solution 4: Fix Server-Side Timeout Settings
If the error occurs after a period of inactivity, the server (or a proxy) is likely killing idle connections.
Check current timeout settings:
-- Check idle connection timeout
SHOW idle_in_transaction_session_timeout;
-- Check statement timeout
SHOW statement_timeout;
-- Check TCP keepalive settings
SHOW tcp_keepalives_idle;
Adjust settings in postgresql.conf:
# Kill idle-in-transaction connections after 5 minutes (prevents resource locks)
idle_in_transaction_session_timeout = 300000 # milliseconds
# Set a generous statement timeout (0 = no timeout)
statement_timeout = 60000 # 60 seconds
# Increase max connections if needed
max_connections = 200
For cloud databases (AWS RDS, Azure, etc.):
Cloud providers often have additional timeout settings at the load balancer or proxy level. For example, AWS RDS Proxy has a ConnectionBorrowTimeout and IdleClientTimeout. Check your cloud provider's documentation for these settings.
Solution 5: Handle OOM (Out of Memory) Issues
If PostgreSQL is being killed by the OS due to memory pressure, you'll see EOF detected errors on all connections simultaneously.
Diagnose
# Check if PostgreSQL was OOM-killed
dmesg | grep -i "killed process"
journalctl -u postgresql --since "1 hour ago" | grep -i "terminated\|killed"
Fix: Tune PostgreSQL memory settings (postgresql.conf):
# Reduce shared memory usage
shared_buffers = 256MB # Start with 25% of available RAM
effective_cache_size = 768MB # Set to 75% of available RAM
work_mem = 4MB # Per-operation memory; be conservative
maintenance_work_mem = 64MB
# Limit connections to reduce memory usage
max_connections = 100
Prevent OOM kills on Linux:
# Reduce the likelihood of PostgreSQL being OOM-killed
echo -1000 > /proc/$(pidof postgres)/oom_score_adj
Solution 6: Django-Specific Configuration
If you're using Django, configure persistent connections with health checks:
# settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'mydb',
'USER': 'postgres',
'PASSWORD': 'secret',
'HOST': 'your-db-host',
'PORT': '5432',
'CONN_MAX_AGE': 300, # Reuse connections for 5 minutes
'CONN_HEALTH_CHECKS': True, # Django 4.1+: verify before reuse
'OPTIONS': {
'sslmode': 'require',
'keepalives': 1,
'keepalives_idle': 30,
'keepalives_interval': 10,
'keepalives_count': 5,
'connect_timeout': 10,
},
}
}
Without CONN_HEALTH_CHECKS = True, Django may try to reuse a dead persistent connection, resulting in the EOF error being surfaced to the user. This setting was introduced in Django 4.1.
Diagnostic Checklist
| # | Check | Command / Action |
|---|---|---|
| 1 | PostgreSQL logs | Look for termination messages or OOM events |
| 2 | System logs | dmesg | grep oom |
| 3 | Network stability | ping and traceroute to the database host |
| 4 | Connection count | SELECT count(*) FROM pg_stat_activity; |
| 5 | Idle connection age | SELECT pid, state, query_start FROM pg_stat_activity WHERE state = 'idle'; |
| 6 | Server memory usage | free -h on the database server |
| 7 | Proxy/LB timeouts | Check PgBouncer, RDS Proxy, or load balancer settings |
| 8 | SSL certificate expiry | openssl x509 -in server.crt -noout -dates |
Conclusion
The psycopg2 OperationalError: SSL SYSCALL error: EOF detected error means the PostgreSQL server closed your SSL connection without warning.
The cause is almost always external to your application code: server restarts, idle timeouts, memory pressure, or network issues.
The most effective defenses are:
- enable TCP keepalives to detect dead connections early
- implement retry logic for transient failures
- use connection pooling to automatically recycle broken connections
- check server logs to identify the root cause.
For production applications, combining connection pooling with keepalives and Django's CONN_HEALTH_CHECKS provides the most resilient setup against unexpected SSL disconnections.