manage/k8s: document decommission timing settings (--decommission-wait-interval, RequeueAfter) by david-yu · Pull Request #1761 · redpanda-data/docs

david-yu · 2026-06-23T20:54:50Z

What

Adds a Tune automatic decommission timing section to manage/kubernetes/k-decommission-brokers.adoc documenting the re-check / requeue interval settings for the automatic decommissioner, which were previously undocumented for the Operator deployment mode.

Covers, with defaults and a worked example:

Setting	Default	Mode
`--decommission-wait-interval` (via operator `additionalCmdFlags`)	`8s`	Operator
`decommissionRequeueTimeout`	`10s`	Helm sidecar
`decommissionAfter`	`60s`	Helm sidecar

The section explains:

How to pass --decommission-wait-interval through the operator chart's additionalCmdFlags.
That this flag sets the Decommission controller's RequeueAfter — i.e. the next run in <interval> value visible in the operator logs.
Guidance for adjusting the values: re-check cadence vs. the decommissionAfter debounce window, and that these intervals do not affect partition reallocation throughput (that's raft_learner_recovery_rate / partition_autobalancing_concurrent_moves).

Also adds a TIP cross-link from the existing Operator enablement step.

Why

Customer question (Arctic Wolf) on the Decommission controller's reconcile cadence and the 8s default; the flag and RequeueAfter behavior were not documented. Tracked in DOC-2270.

Verification (EKS)

Validated on a fresh EKS cluster:

Operator 26.1.6 with --decommission-wait-interval=300s: the DecommissionReconciler watching the V2 (Redpanda CRD) StatefulSet logs successful reconciliation finished in 1m0s, next run in 5m0s — confirming the flag sets the requeue cadence (default would be 8s). The ~1-minute reconcile duration is the old controller's inline cluster-health stability wait.
Operator 26.2.1-beta.2: identical behavior (still the old reconciler).
An intentional scale-down is decommissioned promptly (~2s) by the operator's core ClusterReconciler; this interval governs the secondary re-check cadence, not scale-down speed.

Notes for reviewers

Source of truth: operator cmd/run/run.go (--decommission-wait-interval, default 8s) → internal/controller/olddecommission/redpanda_decommission_controller.go (uses it as the RequeueAfter fallback over the 30s constant).
Version sensitivity: in all currently installable operators (25.3.x, 26.1.x, and 26.2.1-beta.2) --additional-controllers=decommission selects the controller that consumes this flag for V2 clusters (verified on EKS). A refactor on main (commit e9e70de, 2026-06-22) switches decommission to the new NodePool-aware StatefulSetDecommissioner (which ignores this flag) and renames the old one to legacy-decommission. That change is not in any release or pre-release yet (latest pre-release is beta.2 from 2026-05-29); it will ship in a later 26.2 build. This page documents current behavior; it should get a version note when that build ships.

🤖 Generated with Claude Code

Preview pages

Decommission Brokers in Kubernetes (updated)

…t-interval, RequeueAfter) Add a "Tune automatic decommission timing" section to the Kubernetes decommission guide explaining the re-check/requeue interval settings for both deployment modes: - Operator: --decommission-wait-interval (default 8s), passed via the operator chart's additionalCmdFlags, which sets the Decommission controller's RequeueAfter (surfaced as the "next run in" log line). - Helm sidecar: decommissionRequeueTimeout (10s) and decommissionAfter (60s). Includes defaults, a worked helm example, how to read the interval from operator logs, and guidance for adjusting the values (recheck vs debounce, reallocation throughput is separate). Ref: DOC-2270 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

netlify · 2026-06-23T20:54:54Z

✅ Deploy Preview for redpanda-docs-preview ready!

Name	Link
🔨 Latest commit	`8052d9c`
🔍 Latest deploy log	https://app.netlify.com/projects/redpanda-docs-preview/deploys/6a3b66da4af52a000801fad0
😎 Deploy Preview	https://deploy-preview-1761--redpanda-docs-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2026-06-23T20:55:06Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2353a3f7-5cd5-40b6-b6ee-92bf2822ec73

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

The pull request updates modules/manage/pages/kubernetes/k-decommission-brokers.adoc with two additions. A tip is inserted in the BrokerDecommissioner setup steps directing users to pass --decommission-wait-interval via additionalCmdFlags and linking to a new section. That new section, "Tune automatic decommission timing," documents the polling interval and debounce parameters for both the Operator's Decommission controller (--decommission-wait-interval) and the Helm sidecar deployment (decommissionRequeueTimeout, decommissionAfter), along with defaults, example commands, sample log output, and a clarification that these settings affect re-check timing only, not partition reallocation throughput.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

redpanda-data/docs#1717: Directly related — both PRs modify the same file's BrokerDecommissioner instructions, specifically around --decommission-wait-interval via additionalCmdFlags and decommission timing behavior.

Suggested reviewers

kbatuigas
joe-redpanda

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request provides a detailed description with 'What', 'Why', 'Verification', and 'Notes for reviewers' sections, but is missing required template sections including a JIRA ticket link, review deadline, page previews, and checkbox selections.	Add the missing template sections: link to the JIRA ticket (DOC-2270), set review deadline, include Netlify page preview URL, and mark appropriate checkboxes (likely 'Content gap' based on the undocumented settings being addressed).

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: documenting decommission timing settings for Kubernetes.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dyu/decommission-wait-interval-docs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

…cale-in gate Per EKS end-to-end testing: a user-initiated scale-in (reducing statefulset.replicas) is detected from a StatefulSet watch event and acted on promptly (~seconds) regardless of --decommission-wait-interval. The interval governs the periodic re-check cadence for conditions that arise without a triggering event (for example, a broker that becomes unreachable), so raising it does not delay routine scale-ins. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

micheleRP · 2026-06-23T22:21:19Z

Docs review

Overall: Solid, technically accurate addition. No critical issues — all xrefs/anchors resolve and the behavior was EKS-verified. A few minor consistency suggestions below.

Critical issues

None. Both property xrefs are valid (reference:cluster-properties.adoc and reference:tunable-properties.adoc both alias to properties/cluster-properties.adoc, and both anchors exist in the included partial).

Suggestions

Capitalization: the heading "Set the interval for the Operator" capitalizes the bare noun. The docs convention is lowercase "operator" in prose (~7:1 across the repo), reserving "Redpanda Operator" for the product name — and this same file already uses lowercase ("the operator detects the change"). Recommend: "Set the interval for the operator".
Helm command quoting: the new operator example uses --set "additionalCmdFlags={...=decommission,--decommission-wait-interval=30s}" (whole arg quoted) while the existing example on the page uses --set additionalCmdFlags={--additional-controllers="decommission"}. The new form is more shell-correct — worth aligning the two for consistency.
Intro precision: the intro says the decommissioner "polls the cluster on a regular interval," but the later clarification (and your EKS testing) notes operator scale-ins are event-driven via a StatefulSet watch. Softening "polls… to detect" → "re-checks" would align the intro with the later bullet.

Impact on other files

None. Single existing page; no nav or What's New entry needed. Related PR #1717 (same file) is already merged — no conflict.

What works well

EKS-verified behavior; clean Operator-vs-Helm separation; good internal cross-linking; correct AsciiDoc throughout (table, [.no-copy] log block, {latest-operator-version} attribute, anchors).
Good catch on the version-sensitivity note (the main refactor to legacy-decommission) — worth a version note when that build ships, as you flagged.

These are all minor; the PR is in good shape to merge.

micheleRP

lovely, thanks David! Claude has some minor suggestions you can consider

…ator, re-check wording - Align both additionalCmdFlags examples to the shell-correct form (--set "additionalCmdFlags={...}"): outer-quoted to protect {}/comma from brace expansion, no pointless inner quotes. Verified the rendered list with `helm template`: ["--additional-controllers=decommission","--decommission-wait-interval=30s"]. - Lowercase bare-noun "operator" (heading + table label) per docs convention. - Intro: "polls ... to detect" -> "re-checks ... for" to match the event-driven note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

david-yu · 2026-06-24T05:10:31Z

Thanks addressed the feedback from the review. We should keep "Operator" instead "operator" because the capitalized Operator is typically used in K8s to describe a Kubernetes Operator instead of a human operator.

…re noun Per maintainer preference, capitalize bare-noun "Operator" page-wide (heading, table label, prose) — reverts the earlier lowercasing. Chart path `redpanda/operator` and the `{latest-operator-version}` attribute stay lowercase. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

david-yu · 2026-06-24T05:11:11Z

Thanks @micheleRP! Addressed:

Helm --set quoting — aligned both additionalCmdFlags examples to the shell-correct form --set "additionalCmdFlags={...}" (outer-quoted, no inner quotes). I verified the rendered list with helm template:
- --set "additionalCmdFlags={--additional-controllers=decommission}" → ["--additional-controllers=decommission"]
- --set "additionalCmdFlags={--additional-controllers=decommission,--decommission-wait-interval=30s}" → ["--additional-controllers=decommission","--decommission-wait-interval=30s"]
  The outer quotes matter for the multi-flag form specifically: without them the shell brace-expands {a,b} and mangles the list; the previous inner-quoted form (={...="decommission"}) only worked because an interactive shell strips the quotes.
Intro precision — "polls … to detect" → "re-checks … for", matching the event-driven note below.
Capitalization — went the other way here, per maintainer preference: using capital "Operator" (the bare-noun product reference) since it's the more accepted form in Kubernetes docs. Applied page-wide for consistency; the redpanda/operator chart path and {latest-operator-version} attribute stay lowercase. Flagging so it's clear that's a deliberate choice, not an unaddressed comment.

david-yu · 2026-06-24T17:18:25Z

Will backport to 25.3.x and 25.2.x

david-yu · 2026-06-24T17:46:36Z

Backports opened:

v/25.3 → [v/25.3] manage/k8s: document decommission timing (--decommission-wait-interval) #1763
v/25.2 → [v/25.2] manage/k8s: document decommission timing (--decommission-wait-interval) #1764

Each adds the "Tune automatic decommission timing" section + TIP cross-ref + shell-correct --set quoting. The main-only Operator-example rewrite (the "do not add brokerDecommissioner" paragraph/callout) was intentionally omitted on both, since the version branches' Operator examples still use the brokerDecommissioner sidecar.

david-yu requested a review from a team as a code owner June 23, 2026 20:54

micheleRP approved these changes Jun 23, 2026

View reviewed changes

micheleRP merged commit 9e3a661 into main Jun 24, 2026
7 checks passed

micheleRP deleted the dyu/decommission-wait-interval-docs branch June 24, 2026 13:40

This was referenced Jun 24, 2026

[v/25.3] manage/k8s: document decommission timing (--decommission-wait-interval) #1763

Merged

[v/25.2] manage/k8s: document decommission timing (--decommission-wait-interval) #1764

Open

This was referenced Jun 24, 2026

upgrade: document broker restarts during operator upgrades #1758

Open

manage/rpk: document OAUTHBEARER (OIDC) for Admin API + Schema Registry, add validation step #1762

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

manage/k8s: document decommission timing settings (--decommission-wait-interval, RequeueAfter)#1761

manage/k8s: document decommission timing settings (--decommission-wait-interval, RequeueAfter)#1761
micheleRP merged 4 commits into
mainfrom
dyu/decommission-wait-interval-docs

david-yu commented Jun 23, 2026 •

edited by micheleRP

Loading

Uh oh!

netlify Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading

Review skipped

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

micheleRP commented Jun 23, 2026

Uh oh!

micheleRP left a comment

Uh oh!

david-yu commented Jun 24, 2026

Uh oh!

david-yu commented Jun 24, 2026

Uh oh!

Uh oh!

david-yu commented Jun 24, 2026

Uh oh!

david-yu commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

david-yu commented Jun 23, 2026 • edited by micheleRP Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Verification (EKS)

Notes for reviewers

Preview pages

Uh oh!

netlify Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for redpanda-docs-preview ready!

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

micheleRP commented Jun 23, 2026

Docs review

Critical issues

Suggestions

Impact on other files

What works well

Uh oh!

micheleRP left a comment

Choose a reason for hiding this comment

Uh oh!

david-yu commented Jun 24, 2026

Uh oh!

david-yu commented Jun 24, 2026

Uh oh!

Uh oh!

david-yu commented Jun 24, 2026

Uh oh!

david-yu commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

david-yu commented Jun 23, 2026 •

edited by micheleRP

Loading

netlify Bot commented Jun 23, 2026 •

edited

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading