Skip to content

feat(worker): Layer-3 payment prober — continuous money heartbeat (flag-gated OFF)#98

Merged
mastermanas805 merged 1 commit into
masterfrom
feat/payment-probe-l3
Jun 6, 2026
Merged

feat(worker): Layer-3 payment prober — continuous money heartbeat (flag-gated OFF)#98
mastermanas805 merged 1 commit into
masterfrom
feat/payment-probe-l3

Conversation

@mastermanas805
Copy link
Copy Markdown
Member

Builds the Layer-3 payment-health synthetic the tooling forum identified as
the real gap (grep instant_payment_probe → none today). Forum verdict
docs/ci/FORUM-PAYMENT-E2E-TOOLING.md §4: the fastest, most-deterministic
money-path signal is an in-cluster iframe-free Go prober, not a browser
driver. Mirrors the auth_probe / deploy_probe / flow_synthetic pattern.

Depends on

The prober (internal/jobs/payment_probe.go)

A River periodic job (every 5 min), flag-gated PAYMENT_PROBE_ENABLED (default
OFF, fully inert until the operator lights it).

Prod-safe legs (non-charging, contract-only):

  • checkout_reachable — POST /api/v1/billing/checkout → non-5xx (a
    402/409/502 blocked-but-alive shape is a PASS while Razorpay
    live-recurring is operator-blocked; only a 5xx crash fails).
  • billing_state — GET /api/v1/billing → non-5xx.
  • invoices_reachable — GET /api/v1/billing/invoices → non-5xx.
  • webhook_security — POST /razorpay/webhook with a garbage unsigned
    payload MUST be rejected 400 invalid_signature (positive proof the
    signature gate is live; an accepted unsigned payload is a CRITICAL fail).

Optional upgrade leg (only when the TEST webhook secret + test plan id are
set; skips clean/degraded otherwise — NO live Razorpay, NO real money):

  • upgrade_webhook_e2e — mint a fresh is_test_cohort=true team → inject a
    correctly-signed TEST-mode subscription.charged (HMAC-SHA256 raw body,
    the api's verifier scheme) → assert teams.plan_tier flipped (the rule-12
    downstream truth surface
    , NOT the webhook 200) → reap the cohort team
    (always, even on failure).

Observability (rule 25 — infra PR ships in lockstep)

instant_payment_probe_outcome_total{leg,result} +
instant_payment_probe_latency_seconds{leg} (lazy *Vec, primed in
metrics_test.go) + the InstantPaymentProbe NR event (cohort=synthetic,
excluded from billing/revenue dashboards) + an audit_log row + structured
ERROR slog line on fail. result="degraded" is the config-unset /
slow-but-correct state and never pages — so the prober is inert AND
non-paging until the operator wires the flag + (upgrade leg) the test secret.
infra PR: alert + prom rule + dashboard tile + METRICS-CATALOG row.

Flag-off-inert proof

TestPaymentProbe_FlagOff_ProbesNothing: with Enabled=false the worker
no-ops — zero outcome emissions, no HTTP, no DB.

Verification

  • make gate green for the touched packages (go build ./... + go vet ./...
    • go test). The two TestIntegration_BillingReconciler_* failures seen
      only in the broad parallel ./... run are pre-existing shared-test-DB
      contention flakes — both PASS in isolation and run together under -p 1
      (62s); neither touches any file in this PR.
  • go test ./internal/jobs/ -run TestPaymentProbe -race green.
  • 97.9% per-func coverage on payment_probe.go (the few gaps are the
    unreachable build_request defensive branches, matching the flow_synthetic
    pattern).

Operator follow-up (awaiting)

  • PAYMENT_PROBE_ENABLED=true to activate.
  • RAZORPAY_TEST_WEBHOOK_SECRET + PAYMENT_PROBE_TEST_PLAN_ID_PRO to enable
    the optional upgrade leg (skips clean otherwise).

🤖 Generated with Claude Code

… (flag-gated OFF)

Builds the Layer-3 payment-health synthetic the tooling forum identified as the
real gap (`grep instant_payment_probe` → none today). Forum verdict
docs/ci/FORUM-PAYMENT-E2E-TOOLING.md §4: the fastest, most-deterministic
money-path signal is an in-cluster iframe-free Go prober, NOT a browser driver.
Mirrors the auth_probe / deploy_probe / flow_synthetic pattern exactly.

payment_probe.go — a River periodic job (every 5 min), flag-gated
PAYMENT_PROBE_ENABLED (default OFF, fully inert until the operator lights it):

  Prod-safe legs (non-charging, contract-only):
   - checkout_reachable  — POST /api/v1/billing/checkout → non-5xx (a
                           402/409/502 blocked-but-alive shape is a PASS while
                           Razorpay live-recurring is operator-blocked; only a
                           5xx crash fails).
   - billing_state       — GET /api/v1/billing → non-5xx.
   - invoices_reachable  — GET /api/v1/billing/invoices → non-5xx.
   - webhook_security    — POST /razorpay/webhook with a garbage UNSIGNED
                           payload MUST be rejected 400 invalid_signature
                           (positive proof the signature gate is live; an
                           accepted unsigned payload is a CRITICAL fail).

  Optional upgrade leg (only when the TEST webhook secret + test plan id are
  set; skips clean/degraded otherwise — NO live Razorpay, NO real money):
   - upgrade_webhook_e2e — mint a fresh is_test_cohort=true team → inject a
                           correctly-signed TEST-mode subscription.charged
                           (HMAC-SHA256 raw body, the api's verifier scheme) →
                           assert teams.plan_tier flipped (the rule-12
                           downstream truth surface, NOT the webhook 200) →
                           reap the cohort team (always, even on failure).

Observability (rule 25, infra PR ships in lockstep):
  instant_payment_probe_outcome_total{leg,result} + instant_payment_probe_latency_seconds{leg}
  (lazy *Vec, primed in metrics_test.go) + the InstantPaymentProbe NR event
  (cohort=synthetic, excluded from billing/revenue dashboards) + an audit_log
  row + structured ERROR slog line on fail. result="degraded" is the
  config-unset / slow-but-correct state and never pages — so the prober is
  inert AND non-paging until the operator wires the flag + (for the upgrade
  leg) the test webhook secret.

Tests: flag-off proven inert (zero probes, no HTTP, no DB); each leg's
pass/fail/degraded outcome; the unsigned-webhook-rejected + accepted-unsigned
security cases; the upgrade tier-flip pass + the no-flip / webhook-non-200
fail (rule-12 discipline); cohort reap on every upgrade path; signer parity
(64-hex HMAC matching the api verifier); the leg vocabulary registry; the
panic boundary. 97.9% per-func coverage on payment_probe.go.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@mastermanas805 mastermanas805 force-pushed the feat/payment-probe-l3 branch from d797d6c to 092cda9 Compare June 6, 2026 14:56
@mastermanas805 mastermanas805 merged commit 8bcc320 into master Jun 6, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant