feat(worker): Layer-3 payment prober — continuous money heartbeat (flag-gated OFF)#98
Merged
Merged
Conversation
… (flag-gated OFF)
Builds the Layer-3 payment-health synthetic the tooling forum identified as the
real gap (`grep instant_payment_probe` → none today). Forum verdict
docs/ci/FORUM-PAYMENT-E2E-TOOLING.md §4: the fastest, most-deterministic
money-path signal is an in-cluster iframe-free Go prober, NOT a browser driver.
Mirrors the auth_probe / deploy_probe / flow_synthetic pattern exactly.
payment_probe.go — a River periodic job (every 5 min), flag-gated
PAYMENT_PROBE_ENABLED (default OFF, fully inert until the operator lights it):
Prod-safe legs (non-charging, contract-only):
- checkout_reachable — POST /api/v1/billing/checkout → non-5xx (a
402/409/502 blocked-but-alive shape is a PASS while
Razorpay live-recurring is operator-blocked; only a
5xx crash fails).
- billing_state — GET /api/v1/billing → non-5xx.
- invoices_reachable — GET /api/v1/billing/invoices → non-5xx.
- webhook_security — POST /razorpay/webhook with a garbage UNSIGNED
payload MUST be rejected 400 invalid_signature
(positive proof the signature gate is live; an
accepted unsigned payload is a CRITICAL fail).
Optional upgrade leg (only when the TEST webhook secret + test plan id are
set; skips clean/degraded otherwise — NO live Razorpay, NO real money):
- upgrade_webhook_e2e — mint a fresh is_test_cohort=true team → inject a
correctly-signed TEST-mode subscription.charged
(HMAC-SHA256 raw body, the api's verifier scheme) →
assert teams.plan_tier flipped (the rule-12
downstream truth surface, NOT the webhook 200) →
reap the cohort team (always, even on failure).
Observability (rule 25, infra PR ships in lockstep):
instant_payment_probe_outcome_total{leg,result} + instant_payment_probe_latency_seconds{leg}
(lazy *Vec, primed in metrics_test.go) + the InstantPaymentProbe NR event
(cohort=synthetic, excluded from billing/revenue dashboards) + an audit_log
row + structured ERROR slog line on fail. result="degraded" is the
config-unset / slow-but-correct state and never pages — so the prober is
inert AND non-paging until the operator wires the flag + (for the upgrade
leg) the test webhook secret.
Tests: flag-off proven inert (zero probes, no HTTP, no DB); each leg's
pass/fail/degraded outcome; the unsigned-webhook-rejected + accepted-unsigned
security cases; the upgrade tier-flip pass + the no-flip / webhook-non-200
fail (rule-12 discipline); cohort reap on every upgrade path; signer parity
(64-hex HMAC matching the api verifier); the leg vocabulary registry; the
panic boundary. 97.9% per-func coverage on payment_probe.go.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
d797d6c to
092cda9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds the Layer-3 payment-health synthetic the tooling forum identified as
the real gap (
grep instant_payment_probe→ none today). Forum verdictdocs/ci/FORUM-PAYMENT-E2E-TOOLING.md§4: the fastest, most-deterministicmoney-path signal is an in-cluster iframe-free Go prober, not a browser
driver. Mirrors the
auth_probe/deploy_probe/flow_syntheticpattern.Depends on
InstantPaymentProbeevent +AttrHTTPStatus).Worker CI checks out common's master as the sibling
../common(replacedirective), so test(jobs): email-forwarder coverage ≥95% #48 must merge first for build-and-test to pass.
The prober (
internal/jobs/payment_probe.go)A River periodic job (every 5 min), flag-gated
PAYMENT_PROBE_ENABLED(defaultOFF, fully inert until the operator lights it).
Prod-safe legs (non-charging, contract-only):
checkout_reachable— POST/api/v1/billing/checkout→ non-5xx (a402/409/502 blocked-but-alive shape is a PASS while Razorpay
live-recurring is operator-blocked; only a 5xx crash fails).
billing_state— GET/api/v1/billing→ non-5xx.invoices_reachable— GET/api/v1/billing/invoices→ non-5xx.webhook_security— POST/razorpay/webhookwith a garbage unsignedpayload MUST be rejected
400 invalid_signature(positive proof thesignature gate is live; an accepted unsigned payload is a CRITICAL fail).
Optional upgrade leg (only when the TEST webhook secret + test plan id are
set; skips clean/degraded otherwise — NO live Razorpay, NO real money):
upgrade_webhook_e2e— mint a freshis_test_cohort=trueteam → inject acorrectly-signed TEST-mode
subscription.charged(HMAC-SHA256 raw body,the api's verifier scheme) → assert
teams.plan_tierflipped (the rule-12downstream truth surface, NOT the webhook 200) → reap the cohort team
(always, even on failure).
Observability (rule 25 — infra PR ships in lockstep)
instant_payment_probe_outcome_total{leg,result}+instant_payment_probe_latency_seconds{leg}(lazy *Vec, primed inmetrics_test.go) + theInstantPaymentProbeNR event (cohort=synthetic,excluded from billing/revenue dashboards) + an
audit_logrow + structuredERROR slog line on fail.
result="degraded"is the config-unset /slow-but-correct state and never pages — so the prober is inert AND
non-paging until the operator wires the flag + (upgrade leg) the test secret.
infra PR: alert + prom rule + dashboard tile + METRICS-CATALOG row.
Flag-off-inert proof
TestPaymentProbe_FlagOff_ProbesNothing: withEnabled=falsethe workerno-ops — zero outcome emissions, no HTTP, no DB.
Verification
make gategreen for the touched packages (go build ./...+go vet ./...go test). The twoTestIntegration_BillingReconciler_*failures seenonly in the broad parallel
./...run are pre-existing shared-test-DBcontention flakes — both PASS in isolation and run together under
-p 1(62s); neither touches any file in this PR.
go test ./internal/jobs/ -run TestPaymentProbe -racegreen.payment_probe.go(the few gaps are theunreachable
build_requestdefensive branches, matching theflow_syntheticpattern).
Operator follow-up (awaiting)
PAYMENT_PROBE_ENABLED=trueto activate.RAZORPAY_TEST_WEBHOOK_SECRET+PAYMENT_PROBE_TEST_PLAN_ID_PROto enablethe optional upgrade leg (skips clean otherwise).
🤖 Generated with Claude Code