> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Kubernetes Readiness Probe Hangs on /status Endpoint

export const plans_0 = "Enterprise"

export const deployments_0 = "Self-hosted"

export const data_plane_version_0 = undefined

export const use_case_0 = "Use case - Self-hosted Kubernetes deployments experiencing probe failures under load"

<Note>
  **Applies to:**

  * Plan - {plans_0}
  * Deployment - {deployments_0}
  * {data_plane_version_0}
  * {use_case_0}
</Note>

## Summary

Kubernetes readiness probes hanging on the `/status` endpoint typically occur due to database connection pool exhaustion under load. When all available database connections are occupied, the health check waits indefinitely for a connection to verify database health, causing probe timeouts and pod failures. The resolution involves increasing the `PG_POOL_CONFIG_MAX_NUM_CLIENTS` environment variable to accommodate the expected load.

## Resolution Steps

### Step 1: Gather diagnostic information

Ask the customer for the following details:

* Kubernetes readiness probe timeout settings
* Which specific health checks are timing out (check pod logs)
* Resource utilization (CPU/memory) during failures
* Current `PG_POOL_CONFIG_MAX_NUM_CLIENTS` setting (if unset, default is 10)
* Number of API server replicas and expected QPS

### Step 2: Identify connection pool exhaustion

Look for patterns indicating database connection pool issues:

* Normal CPU/memory usage but hanging `/status` endpoint
* Manual curl to `/status` also hangs during load
* No explicit `PG_POOL_CONFIG_MAX_NUM_CLIENTS` configuration

### Step 3: Configure connection pool size

Instruct the customer to set the database connection pool size based on their deployment method:

#### For Helm deployments

```text theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
env:
  - name: PG_POOL_CONFIG_MAX_NUM_CLIENTS
    value: "50" #arbitrary number - select a value that makes sense to your use-case

```

#### For direct environment variable configuration

```text theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
PG_POOL_CONFIG_MAX_NUM_CLIENTS=50

```

**Sizing guideline:** Start with 5-10 connections per API server replica, then adjust based on load testing results.

### Step 4: Verify the fix

After redeployment, confirm that:

* The `/status` endpoint responds consistently under load
* Readiness probes no longer timeout
* Pods remain healthy during sustained traffic

### If connection pool adjustment doesn't resolve the issue

Consider these additional factors:

* Database `max_connections` limit may need adjustment
* Other dependencies (Redis, S3) may be slow - check for rate limiting
* Temporarily increase probe timeout and failure threshold for testing
