manage/k8s: document decommission timing settings (--decommission-wait-interval, RequeueAfter)#1761
Conversation
…t-interval, RequeueAfter) Add a "Tune automatic decommission timing" section to the Kubernetes decommission guide explaining the re-check/requeue interval settings for both deployment modes: - Operator: --decommission-wait-interval (default 8s), passed via the operator chart's additionalCmdFlags, which sets the Decommission controller's RequeueAfter (surfaced as the "next run in" log line). - Helm sidecar: decommissionRequeueTimeout (10s) and decommissionAfter (60s). Includes defaults, a worked helm example, how to read the interval from operator logs, and guidance for adjusting the values (recheck vs debounce, reallocation throughput is separate). Ref: DOC-2270 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe pull request updates Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…cale-in gate Per EKS end-to-end testing: a user-initiated scale-in (reducing statefulset.replicas) is detected from a StatefulSet watch event and acted on promptly (~seconds) regardless of --decommission-wait-interval. The interval governs the periodic re-check cadence for conditions that arise without a triggering event (for example, a broker that becomes unreachable), so raising it does not delay routine scale-ins. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Docs reviewOverall: Solid, technically accurate addition. No critical issues — all xrefs/anchors resolve and the behavior was EKS-verified. A few minor consistency suggestions below. Critical issuesNone. Both property xrefs are valid ( Suggestions
Impact on other filesNone. Single existing page; no nav or What's New entry needed. Related PR #1717 (same file) is already merged — no conflict. What works well
These are all minor; the PR is in good shape to merge. |
micheleRP
left a comment
There was a problem hiding this comment.
lovely, thanks David! Claude has some minor suggestions you can consider
…ator, re-check wording
- Align both additionalCmdFlags examples to the shell-correct form
(--set "additionalCmdFlags={...}"): outer-quoted to protect {}/comma from
brace expansion, no pointless inner quotes. Verified the rendered list with
`helm template`: ["--additional-controllers=decommission","--decommission-wait-interval=30s"].
- Lowercase bare-noun "operator" (heading + table label) per docs convention.
- Intro: "polls ... to detect" -> "re-checks ... for" to match the event-driven note.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks addressed the feedback from the review. We should keep "Operator" instead "operator" because the capitalized Operator is typically used in K8s to describe a Kubernetes Operator instead of a human operator. |
…re noun
Per maintainer preference, capitalize bare-noun "Operator" page-wide (heading,
table label, prose) — reverts the earlier lowercasing. Chart path
`redpanda/operator` and the `{latest-operator-version}` attribute stay lowercase.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @micheleRP! Addressed:
|
|
Will backport to 25.3.x and 25.2.x |
|
Backports opened:
Each adds the "Tune automatic decommission timing" section + TIP cross-ref + shell-correct |
What
Adds a Tune automatic decommission timing section to
manage/kubernetes/k-decommission-brokers.adocdocumenting the re-check / requeue interval settings for the automatic decommissioner, which were previously undocumented for the Operator deployment mode.Covers, with defaults and a worked example:
--decommission-wait-interval(via operatoradditionalCmdFlags)8sdecommissionRequeueTimeout10sdecommissionAfter60sThe section explains:
--decommission-wait-intervalthrough the operator chart'sadditionalCmdFlags.RequeueAfter— i.e. thenext run in <interval>value visible in the operator logs.decommissionAfterdebounce window, and that these intervals do not affect partition reallocation throughput (that'sraft_learner_recovery_rate/partition_autobalancing_concurrent_moves).Also adds a TIP cross-link from the existing Operator enablement step.
Why
Customer question (Arctic Wolf) on the Decommission controller's reconcile cadence and the
8sdefault; the flag andRequeueAfterbehavior were not documented. Tracked in DOC-2270.Verification (EKS)
Validated on a fresh EKS cluster:
--decommission-wait-interval=300s: theDecommissionReconcilerwatching the V2 (RedpandaCRD) StatefulSet logssuccessful reconciliation finished in 1m0s, next run in 5m0s— confirming the flag sets the requeue cadence (default would be8s). The ~1-minute reconcile duration is the old controller's inline cluster-health stability wait.ClusterReconciler; this interval governs the secondary re-check cadence, not scale-down speed.Notes for reviewers
cmd/run/run.go(--decommission-wait-interval, default8s) →internal/controller/olddecommission/redpanda_decommission_controller.go(uses it as theRequeueAfterfallback over the 30s constant).--additional-controllers=decommissionselects the controller that consumes this flag for V2 clusters (verified on EKS). A refactor onmain(commite9e70de, 2026-06-22) switchesdecommissionto the new NodePool-awareStatefulSetDecommissioner(which ignores this flag) and renames the old one tolegacy-decommission. That change is not in any release or pre-release yet (latest pre-release is beta.2 from 2026-05-29); it will ship in a later 26.2 build. This page documents current behavior; it should get a version note when that build ships.🤖 Generated with Claude Code
Preview pages