diff --git a/skills/a7-recipe-circuit-breaker/SKILL.md b/skills/a7-recipe-circuit-breaker/SKILL.md index b13944c..1386978 100644 --- a/skills/a7-recipe-circuit-breaker/SKILL.md +++ b/skills/a7-recipe-circuit-breaker/SKILL.md @@ -2,9 +2,8 @@ name: a7-recipe-circuit-breaker description: >- Recipe skill for implementing circuit breaker patterns using the a7 CLI in API7 Enterprise Edition. - Covers the api-breaker plugin for automatic upstream circuit breaking, - configuring unhealthy thresholds, healthy recovery, response code - classification, and integration with health checks. + Covers the api-breaker plugin, unhealthy thresholds, healthy recovery, + response code classification, and integration with service health checks. version: "1.0.0" author: API7.ai Contributors license: Apache-2.0 @@ -13,91 +12,68 @@ metadata: apisix_version: ">=3.0.0" plugin_name: api-breaker a7_commands: + - a7 service create - a7 route create - a7 route update - a7 route get + - a7 config sync --- # a7-recipe-circuit-breaker ## Overview -A circuit breaker prevents cascading failures by detecting unhealthy upstream -services and temporarily stopping requests to them. When the upstream returns -too many errors, the circuit "opens" and API7 Enterprise Edition (API7 EE) returns errors immediately -without forwarding requests. After a cooldown period, it "half-opens" to test -if the upstream has recovered. +A circuit breaker prevents cascading failures by detecting unhealthy backend +responses and temporarily stopping requests to the failing service. API7 EE +implements this through the `api-breaker` plugin on routes. -API7 EE implements this via the `api-breaker` plugin, which tracks response -status codes and manages circuit state automatically across a gateway group. +Use the current service-backed route model: -## When to Use - -- Protect your API from cascading failures when an upstream goes down. -- Automatically stop sending traffic to failing backends. -- Allow failing services time to recover before retrying. -- Return fast error responses instead of waiting for timeouts. - -## Circuit Breaker States - -``` - ┌─────────┐ - │ CLOSED │ ← Normal operation: requests flow through - │(healthy) │ - └────┬─────┘ - │ Error count exceeds threshold - ▼ - ┌─────────┐ - │ OPEN │ ← Breaker tripped: returns configured status immediately - │(tripped) │ - └────┬─────┘ - │ After cooldown period - ▼ - ┌──────────┐ - │HALF-OPEN │ ← Test: allows one request through - │ (testing) │ - └─────┬────┘ - │ - ┌───────┴───────┐ - │ │ - Success Failure - │ │ - ▼ ▼ - CLOSED OPEN (longer cooldown) -``` +1. Create a service that owns the upstream backend. +2. Create a route with `service_id`. +3. Enable `api-breaker` on the route. ## Plugin Configuration Reference -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| `break_response_code` | integer | **Yes** | — | HTTP status code returned when circuit is open (e.g., 502, 503). | -| `break_response_body` | string | No | — | Response body returned when circuit is open. | -| `break_response_headers` | array[object] | No | — | Response headers when circuit is open. Format: `[{"key": "name", "value": "val"}]`. | -| `unhealthy.http_statuses` | array[integer] | No | `[500]` | HTTP status codes from upstream that count as unhealthy. | -| `unhealthy.failures` | integer | No | `3` | Number of consecutive unhealthy responses before opening the circuit. | -| `healthy.http_statuses` | array[integer] | No | `[200]` | HTTP status codes from upstream that count as healthy (for recovery). | -| `healthy.successes` | integer | No | `3` | Number of consecutive healthy responses to close the circuit. | -| `max_breaker_sec` | integer | No | `300` | Maximum circuit-open duration in seconds. Cooldown doubles each time but caps here. | - -## Breaker Timing +| Field | Required | Description | +|-------|----------|-------------| +| `break_response_code` | Yes | HTTP status returned when the circuit is open | +| `break_response_body` | No | Response body returned when open | +| `break_response_headers` | No | Headers returned when open | +| `unhealthy.http_statuses` | No | Upstream status codes counted as unhealthy | +| `unhealthy.failures` | No | Consecutive unhealthy responses before opening | +| `healthy.http_statuses` | No | Status codes counted as healthy for recovery | +| `healthy.successes` | No | Consecutive healthy responses before closing | +| `max_breaker_sec` | No | Maximum circuit-open duration | -When the circuit opens: -1. First open: **2 seconds** cooldown. -2. If it opens again: **4 seconds** (doubles). -3. Next: **8 seconds**, **16 seconds**, ... -4. Caps at `max_breaker_sec` (default 300s = 5 minutes). +## Step-by-Step: Enable Circuit Breaker -During cooldown, all requests get the `break_response_code` immediately. +### 1. Create a protected service -## Step-by-Step: Enable Circuit Breaker +```bash +a7 service create --gateway-group default -f - <<'EOF' +{ + "id": "backend-service", + "name": "backend-service", + "upstream": { + "type": "roundrobin", + "nodes": [ + {"host": "backend", "port": 8080, "weight": 1} + ] + } +} +EOF +``` -### 1. Basic circuit breaker +### 2. Create a route with `api-breaker` ```bash a7 route create --gateway-group default -f - <<'EOF' { "id": "protected-api", - "uri": "/api/*", + "name": "protected-api", + "paths": ["/api/*"], + "service_id": "backend-service", "plugins": { "api-breaker": { "break_response_code": 502, @@ -111,28 +87,24 @@ a7 route create --gateway-group default -f - <<'EOF' }, "max_breaker_sec": 300 } - }, - "upstream": { - "type": "roundrobin", - "nodes": { - "backend:8080": 1 - } } } EOF ``` -After 3 consecutive 500/502/503 responses, the circuit opens and returns 502 -immediately. After cooldown, it tests with one request. If 3 consecutive 200s -come back, the circuit closes and normal operation resumes. +After three consecutive 500/502/503 responses, the circuit opens and returns +502 immediately. After cooldown, API7 EE tests recovery and closes the circuit +after enough healthy responses. -### 2. Circuit breaker with custom error body +### 3. Custom error response ```bash -a7 route create --gateway-group default -f - <<'EOF' +a7 route update protected-api --gateway-group default -f - <<'EOF' { - "id": "api-with-error-body", - "uri": "/api/*", + "id": "protected-api", + "name": "protected-api", + "paths": ["/api/*"], + "service_id": "backend-service", "plugins": { "api-breaker": { "break_response_code": 503, @@ -151,78 +123,46 @@ a7 route create --gateway-group default -f - <<'EOF' }, "max_breaker_sec": 60 } - }, - "upstream": { - "type": "roundrobin", - "nodes": { - "backend:8080": 1 - } } } EOF ``` -### 3. Sensitive circuit breaker (trips on first error) +### 4. Combine with health checks -```json -{ - "plugins": { - "api-breaker": { - "break_response_code": 503, - "unhealthy": { - "http_statuses": [500, 502, 503], - "failures": 1 - }, - "healthy": { - "http_statuses": [200], - "successes": 1 - }, - "max_breaker_sec": 30 - } - } -} -``` - -Trips on the very first 5xx error. Recovers after one successful response. - -## Combining with Health Checks - -For production, combine the circuit breaker with upstream health checks. -The circuit breaker handles per-route protection while health checks manage -per-node health at the upstream level. +For production, define health checks on the service upstream and keep +`api-breaker` on the route. Health checks manage node health; the circuit +breaker protects this route from repeated upstream failures. ```bash -# Create upstream with health checks -a7 upstream create --gateway-group default -f - <<'EOF' +a7 service create --gateway-group default -f - <<'EOF' { - "id": "monitored-backend", - "type": "roundrobin", - "nodes": { - "backend-1:8080": 1, - "backend-2:8080": 1 - }, - "checks": { - "active": { - "type": "http", - "http_path": "/health", - "healthy": { - "interval": 5, - "successes": 2 - }, - "unhealthy": { - "interval": 3, - "http_failures": 3 + "id": "monitored-backend-service", + "name": "monitored-backend-service", + "upstream": { + "type": "roundrobin", + "nodes": [ + {"host": "backend-1", "port": 8080, "weight": 1}, + {"host": "backend-2", "port": 8080, "weight": 1} + ], + "checks": { + "active": { + "type": "http", + "http_path": "/health", + "healthy": {"interval": 5, "successes": 2}, + "unhealthy": {"interval": 3, "http_failures": 3} } } } } EOF -# Create route with circuit breaker a7 route create --gateway-group default -f - <<'EOF' { "id": "api", - "uri": "/api/*", + "name": "api", + "paths": ["/api/*"], + "service_id": "monitored-backend-service", "plugins": { "api-breaker": { "break_response_code": 503, @@ -235,20 +175,30 @@ a7 route create --gateway-group default -f - <<'EOF' "successes": 3 } } - }, - "upstream_id": "monitored-backend" + } } EOF ``` -## Config Sync Example +## Config Sync ```yaml version: "1" -gateway_group: default +services: + - id: backend-service + name: backend-service + upstream: + type: roundrobin + nodes: + - host: backend + port: 8080 + weight: 1 routes: - id: protected-api - uri: /api/* + name: protected-api + paths: + - /api/* + service_id: backend-service plugins: api-breaker: break_response_code: 503 @@ -265,21 +215,16 @@ routes: http_statuses: [200] successes: 3 max_breaker_sec: 300 - upstream_id: backend -upstreams: - - id: backend - type: roundrobin - nodes: - "backend:8080": 1 ``` ## Troubleshooting | Symptom | Cause | Fix | |---------|-------|-----| -| Circuit never opens | `unhealthy.http_statuses` doesn't include the error code | Add the actual error codes your upstream returns | -| Circuit stays open too long | `max_breaker_sec` too high | Lower `max_breaker_sec` for faster recovery | -| Circuit flaps open/closed | Threshold too low with intermittent errors | Increase `unhealthy.failures` threshold | -| 502 from API7 EE (not circuit breaker) | Upstream truly unreachable (connection refused) | Connection errors also count toward unhealthy threshold | -| Recovery too slow | `healthy.successes` too high | Lower `healthy.successes` for faster recovery | -| Command failed with 403 | RBAC permission issue | Ensure your token has permission to modify routes in the gateway group | +| Circuit never opens | `unhealthy.http_statuses` misses the real error code | Add the actual upstream error codes | +| Circuit stays open too long | `max_breaker_sec` too high | Lower `max_breaker_sec` | +| Circuit flaps | Threshold too low for intermittent errors | Increase `unhealthy.failures` | +| API7 returns 502 outside breaker response | Backend is unreachable | Connection errors also count toward unhealthy thresholds | +| Recovery too slow | `healthy.successes` too high | Lower `healthy.successes` | +| Route not using breaker | Plugin attached to the wrong route | Verify with `a7 route get -o json` | +| Command failed with 403 | RBAC permission issue | Ensure your token can modify routes in the gateway group | diff --git a/skills/a7-recipe-health-check/SKILL.md b/skills/a7-recipe-health-check/SKILL.md index 60c02da..a016b21 100644 --- a/skills/a7-recipe-health-check/SKILL.md +++ b/skills/a7-recipe-health-check/SKILL.md @@ -1,10 +1,9 @@ --- name: a7-recipe-health-check description: >- - Recipe skill for configuring upstream health checks using the a7 CLI in API7 Enterprise Edition. - Covers active health checks (HTTP probing), passive health checks - (response analysis), combining both, configuring healthy/unhealthy - thresholds, and monitoring upstream node status. + Recipe skill for configuring backend health checks using the a7 CLI in API7 Enterprise Edition. + Covers active health checks, passive health checks, combining both, + healthy/unhealthy thresholds, and service-backed route wiring. version: "1.0.0" author: API7.ai Contributors license: Apache-2.0 @@ -12,8 +11,10 @@ metadata: category: recipe apisix_version: ">=3.0.0" a7_commands: - - a7 upstream create - - a7 upstream get + - a7 service create + - a7 service get + - a7 route create + - a7 route list - a7 config sync --- @@ -21,83 +22,70 @@ metadata: ## Overview -Health checks monitor upstream backend nodes and automatically remove -unhealthy nodes from the load balancer pool. API7 Enterprise Edition (API7 EE) supports two types: +Health checks monitor backend nodes and remove unhealthy nodes from load +balancing. In current API7 EE usage, define the upstream and health check +configuration on a service, then attach routes to that service with +`service_id`. -- **Active**: API7 EE periodically probes each node with HTTP/HTTPS/TCP requests. -- **Passive**: API7 EE analyzes real traffic responses to detect failures. +API7 EE supports: -Use both together for the most robust setup across your gateway groups. +- Active checks: gateway probes each node. +- Passive checks: gateway observes real traffic responses. -## When to Use - -- Automatically remove failing backend nodes from rotation. -- Detect and recover from backend failures without manual intervention. -- Ensure high availability across multiple backend instances. -- Monitor backend health status via the a7 CLI. +Use both for production services that need automatic failure detection and +recovery. ## Health Check Configuration Reference ### Active Health Check -| Field | Type | Default | Description | -|-------|------|---------|-------------| -| `checks.active.type` | string | `"http"` | Check type: `"http"`, `"https"`, or `"tcp"` | -| `checks.active.http_path` | string | `"/"` | HTTP path to probe | -| `checks.active.host` | string | — | Host header for HTTP probes | -| `checks.active.port` | integer | — | Override port for probing (default: use node port) | -| `checks.active.timeout` | number | `1` | Probe timeout in seconds | -| `checks.active.concurrency` | integer | `10` | Number of concurrent probes | -| `checks.active.https_verify_certificate` | boolean | `true` | Verify TLS certificate for HTTPS probes | -| `checks.active.req_headers` | array[string] | — | Additional request headers for probes | -| `checks.active.healthy.interval` | integer | `1` | Seconds between probes for healthy nodes | -| `checks.active.healthy.successes` | integer | `2` | Consecutive successes to mark node healthy | -| `checks.active.healthy.http_statuses` | array[integer] | `[200, 302]` | HTTP codes considered healthy | -| `checks.active.unhealthy.interval` | integer | `1` | Seconds between probes for unhealthy nodes | -| `checks.active.unhealthy.http_failures` | integer | `5` | Consecutive HTTP failures to mark unhealthy | -| `checks.active.unhealthy.tcp_failures` | integer | `2` | Consecutive TCP failures to mark unhealthy | -| `checks.active.unhealthy.timeouts` | integer | `3` | Consecutive timeouts to mark unhealthy | -| `checks.active.unhealthy.http_statuses` | array[integer] | `[429, 404, 500, 501, 502, 503, 504, 505]` | HTTP codes considered unhealthy | +| Field | Description | +|-------|-------------| +| `upstream.checks.active.type` | `http`, `https`, or `tcp` | +| `upstream.checks.active.http_path` | HTTP path to probe | +| `upstream.checks.active.healthy.successes` | Consecutive successes to mark healthy | +| `upstream.checks.active.unhealthy.http_failures` | Consecutive HTTP failures to mark unhealthy | +| `upstream.checks.active.unhealthy.timeouts` | Consecutive timeouts to mark unhealthy | ### Passive Health Check -| Field | Type | Default | Description | -|-------|------|---------|-------------| -| `checks.passive.type` | string | `"http"` | Check type: `"http"`, `"https"`, or `"tcp"` | -| `checks.passive.healthy.successes` | integer | `5` | Consecutive successes to mark healthy | -| `checks.passive.healthy.http_statuses` | array[integer] | `[200, 201, 202, ..., 399]` | HTTP codes considered healthy | -| `checks.passive.unhealthy.http_failures` | integer | `5` | Consecutive failures to mark unhealthy | -| `checks.passive.unhealthy.tcp_failures` | integer | `2` | Consecutive TCP failures to mark unhealthy | -| `checks.passive.unhealthy.timeouts` | integer | `7` | Consecutive timeouts to mark unhealthy | -| `checks.passive.unhealthy.http_statuses` | array[integer] | `[429, 500, 503]` | HTTP codes considered unhealthy | +| Field | Description | +|-------|-------------| +| `upstream.checks.passive.type` | `http`, `https`, or `tcp` | +| `upstream.checks.passive.unhealthy.http_statuses` | Status codes treated as unhealthy | +| `upstream.checks.passive.unhealthy.http_failures` | Consecutive failures to mark unhealthy | +| `upstream.checks.passive.healthy.successes` | Consecutive successes to mark healthy | ## Step-by-Step: Configure Health Checks -### 1. Active HTTP health check +### 1. Create a service with active HTTP health checks ```bash -a7 upstream create --gateway-group default -f - <<'EOF' +a7 service create --gateway-group default -f - <<'EOF' { - "id": "backend", - "type": "roundrobin", - "nodes": { - "backend-1:8080": 1, - "backend-2:8080": 1, - "backend-3:8080": 1 - }, - "checks": { - "active": { - "type": "http", - "http_path": "/health", - "healthy": { - "interval": 5, - "successes": 2, - "http_statuses": [200] - }, - "unhealthy": { - "interval": 3, - "http_failures": 3, - "http_statuses": [500, 502, 503] + "id": "backend-service", + "name": "backend-service", + "upstream": { + "type": "roundrobin", + "nodes": [ + {"host": "backend-1", "port": 8080, "weight": 1}, + {"host": "backend-2", "port": 8080, "weight": 1}, + {"host": "backend-3", "port": 8080, "weight": 1} + ], + "checks": { + "active": { + "type": "http", + "http_path": "/health", + "healthy": { + "interval": 5, + "successes": 2, + "http_statuses": [200] + }, + "unhealthy": { + "interval": 3, + "http_failures": 3, + "http_statuses": [500, 502, 503] + } } } } @@ -105,85 +93,52 @@ a7 upstream create --gateway-group default -f - <<'EOF' EOF ``` -API7 EE probes `/health` on each node: -- Every 5s for healthy nodes. -- Every 3s for unhealthy nodes. -- 3 consecutive failures → node removed from rotation in the `default` group. -- 2 consecutive successes → node restored. - -### 2. Passive health check (analyze real traffic) +### 2. Create a route that uses the service ```bash -a7 upstream create --gateway-group default -f - <<'EOF' +a7 route create --gateway-group default -f - <<'EOF' { - "id": "backend-passive", - "type": "roundrobin", - "nodes": { - "backend-1:8080": 1, - "backend-2:8080": 1 - }, - "checks": { - "passive": { - "type": "http", - "unhealthy": { - "http_failures": 3, - "http_statuses": [500, 502, 503], - "timeouts": 3 - }, - "healthy": { - "successes": 5, - "http_statuses": [200, 201, 202, 203, 204] - } - } - } + "id": "api", + "name": "api", + "paths": ["/api/*"], + "service_id": "backend-service" } EOF ``` -No probing — API7 EE watches real traffic responses. After 3 consecutive 5xx -errors, the node is removed. After 5 consecutive successes, it's restored. +Health checks run only for services that are referenced by at least one route. +Verify route wiring with: -**Note**: Passive-only health checks cannot recover a node that receives no -traffic. Combine with active checks for full coverage. +```bash +a7 service get backend-service --gateway-group default --output json +a7 route list --gateway-group default --service-id backend-service --output json +``` -### 3. Combined active + passive (recommended for production) +### 3. Passive health check ```bash -a7 upstream create --gateway-group default -f - <<'EOF' +a7 service create --gateway-group default -f - <<'EOF' { - "id": "production-backend", - "type": "roundrobin", - "nodes": { - "backend-1:8080": 1, - "backend-2:8080": 1, - "backend-3:8080": 1 - }, - "checks": { - "active": { - "type": "http", - "http_path": "/health", - "healthy": { - "interval": 5, - "successes": 2, - "http_statuses": [200] - }, - "unhealthy": { - "interval": 2, - "http_failures": 3, - "timeouts": 2, - "http_statuses": [500, 502, 503, 504] - } - }, - "passive": { - "type": "http", - "unhealthy": { - "http_failures": 3, - "http_statuses": [500, 502, 503], - "timeouts": 3 - }, - "healthy": { - "successes": 3, - "http_statuses": [200, 201, 204] + "id": "backend-passive-service", + "name": "backend-passive-service", + "upstream": { + "type": "roundrobin", + "nodes": [ + {"host": "backend-1", "port": 8080, "weight": 1}, + {"host": "backend-2", "port": 8080, "weight": 1} + ], + "checks": { + "passive": { + "type": "http", + "unhealthy": { + "http_failures": 3, + "http_statuses": [500, 502, 503], + "timeouts": 3 + }, + "healthy": { + "successes": 5, + "http_statuses": [200, 201, 202, 203, 204] + } } } } @@ -191,137 +146,141 @@ a7 upstream create --gateway-group default -f - <<'EOF' EOF ``` -### 4. Verify the referencing route and upstream +Attach the passive-check service to a route before sending traffic through it: ```bash -# Current a7 does not expose a standalone upstream-health command. -# Verify the upstream/route wiring and use gateway observability for node state. -a7 upstream get backend --gateway-group default --output json -a7 route list --gateway-group default --output json -``` - -## Common Patterns - -### TCP health check (non-HTTP services) - -```json +a7 route create --gateway-group default -f - <<'EOF' { - "checks": { - "active": { - "type": "tcp", - "healthy": { - "interval": 5, - "successes": 2 - }, - "unhealthy": { - "interval": 2, - "tcp_failures": 3, - "timeouts": 2 - } - } - } + "id": "api-passive", + "name": "api-passive", + "paths": ["/api-passive/*"], + "service_id": "backend-passive-service" } +EOF ``` -### HTTPS health check with certificate verification +Passive-only checks cannot recover a node that receives no traffic. Combine +passive checks with active checks for full recovery coverage. -```json +### 4. Combined active + passive checks + +```bash +a7 service create --gateway-group default -f - <<'EOF' { - "checks": { - "active": { - "type": "https", - "http_path": "/health", - "https_verify_certificate": true, - "healthy": { - "interval": 10, - "successes": 2, - "http_statuses": [200] + "id": "production-backend-service", + "name": "production-backend-service", + "upstream": { + "type": "roundrobin", + "nodes": [ + {"host": "backend-1", "port": 8080, "weight": 1}, + {"host": "backend-2", "port": 8080, "weight": 1}, + {"host": "backend-3", "port": 8080, "weight": 1} + ], + "checks": { + "active": { + "type": "http", + "http_path": "/health", + "healthy": { + "interval": 5, + "successes": 2, + "http_statuses": [200] + }, + "unhealthy": { + "interval": 2, + "http_failures": 3, + "timeouts": 2, + "http_statuses": [500, 502, 503, 504] + } }, - "unhealthy": { - "interval": 5, - "http_failures": 3 + "passive": { + "type": "http", + "unhealthy": { + "http_failures": 3, + "http_statuses": [500, 502, 503], + "timeouts": 3 + }, + "healthy": { + "successes": 3, + "http_statuses": [200, 201, 204] + } } } } } +EOF ``` -### Custom probe headers (for auth-protected health endpoints) +Attach or update a route to reference the combined-check service: -```json +```bash +a7 route create --gateway-group default -f - <<'EOF' { - "checks": { - "active": { - "type": "http", - "http_path": "/internal/health", - "host": "health.internal", - "req_headers": [ - "Authorization: Bearer health-check-token", - "X-Health-Check: true" - ], - "healthy": { - "interval": 10, - "successes": 2 - }, - "unhealthy": { - "interval": 5, - "http_failures": 3 - } - } - } + "id": "api-production", + "name": "api-production", + "paths": ["/api-production/*"], + "service_id": "production-backend-service" } +EOF ``` -## Config Sync Example +## Config Sync ```yaml version: "1" -gateway_group: default -upstreams: - - id: production-backend - type: roundrobin - nodes: - "backend-1:8080": 1 - "backend-2:8080": 1 - "backend-3:8080": 1 - checks: - active: - type: http - http_path: /health - healthy: - interval: 5 - successes: 2 - http_statuses: [200] - unhealthy: - interval: 2 - http_failures: 3 - timeouts: 2 - http_statuses: [500, 502, 503, 504] - passive: - type: http - unhealthy: - http_failures: 3 - http_statuses: [500, 502, 503] - timeouts: 3 - healthy: - successes: 3 - http_statuses: [200, 201, 204] +services: + - id: production-backend-service + name: production-backend-service + upstream: + type: roundrobin + nodes: + - host: backend-1 + port: 8080 + weight: 1 + - host: backend-2 + port: 8080 + weight: 1 + - host: backend-3 + port: 8080 + weight: 1 + checks: + active: + type: http + http_path: /health + healthy: + interval: 5 + successes: 2 + http_statuses: [200] + unhealthy: + interval: 2 + http_failures: 3 + timeouts: 2 + http_statuses: [500, 502, 503, 504] + passive: + type: http + unhealthy: + http_failures: 3 + http_statuses: [500, 502, 503] + timeouts: 3 + healthy: + successes: 3 + http_statuses: [200, 201, 204] routes: - id: api - uri: /api/* - upstream_id: production-backend + name: api + paths: + - /api/* + service_id: production-backend-service ``` ## Troubleshooting | Symptom | Cause | Fix | |---------|-------|-----| -| Health checks not running | No route references the upstream | Health checks only run for upstreams attached to at least one route | -| All nodes marked unhealthy | Health endpoint returns wrong status code | Verify `http_statuses` includes your health endpoint's response code | -| Node not recovering | Passive-only: no traffic reaches unhealthy node | Add active health checks for recovery | -| Probe hitting wrong endpoint | Default `http_path` is `/` | Set `http_path` to your actual health endpoint | +| Health checks not running | Route does not reference the service | Verify with `a7 route list --gateway-group default --service-id ` | +| All nodes marked unhealthy | Health endpoint returns unexpected status | Verify `http_statuses` includes the response code | +| Node not recovering | Passive-only checks have no traffic to observe | Add active health checks | +| Probe hits wrong endpoint | Default `http_path` is `/` | Set `http_path` to the real health endpoint | | TLS probe fails | Certificate verification fails | Set `https_verify_certificate: false` or fix certificates | -| Health checks too aggressive | Low thresholds with flaky endpoints | Increase `failures` threshold and `interval` | -| No standalone health command | Current a7 does not expose upstream health status | Verify service/route config with `a7 service get` and use gateway observability | +| No standalone health command | Current a7 does not expose upstream health status | Verify config with `a7 service get --gateway-group default` and use gateway observability | | Command failed with 401 | Invalid token | Refresh your token using `a7 context create` | -| Upstream not found | Different gateway group | Ensure `--gateway-group` matches the group where upstream was created | +| Service not found | Different gateway group | Ensure `--gateway-group` matches where the service was created |