Applies to:
Summary
Kubernetes readiness probes hanging on the/status endpoint typically occur due to database connection pool exhaustion under load. When all available database connections are occupied, the health check waits indefinitely for a connection to verify database health, causing probe timeouts and pod failures. The resolution involves increasing the PG_POOL_CONFIG_MAX_NUM_CLIENTS environment variable to accommodate the expected load.
Resolution Steps
Step 1: Gather diagnostic information
Ask the customer for the following details:- Kubernetes readiness probe timeout settings
- Which specific health checks are timing out (check pod logs)
- Resource utilization (CPU/memory) during failures
- Current
PG_POOL_CONFIG_MAX_NUM_CLIENTSsetting (if unset, default is 10) - Number of API server replicas and expected QPS
Step 2: Identify connection pool exhaustion
Look for patterns indicating database connection pool issues:- Normal CPU/memory usage but hanging
/statusendpoint - Manual curl to
/statusalso hangs during load - No explicit
PG_POOL_CONFIG_MAX_NUM_CLIENTSconfiguration
Step 3: Configure connection pool size
Instruct the customer to set the database connection pool size based on their deployment method:For Helm deployments
For direct environment variable configuration
Step 4: Verify the fix
After redeployment, confirm that:- The
/statusendpoint responds consistently under load - Readiness probes no longer timeout
- Pods remain healthy during sustained traffic
If connection pool adjustment doesn’t resolve the issue
Consider these additional factors:- Database
max_connectionslimit may need adjustment - Other dependencies (Redis, S3) may be slow - check for rate limiting
- Temporarily increase probe timeout and failure threshold for testing