OCPEDGE-2687: Add LVMS CI Doctor periodic job#79349
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (12)
✅ Files skipped from review due to trivial changes (8)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughAdds a weekly LVMS "CI Doctor" workflow: a doctor step that runs Claude and writes artifacts, a post step to fetch the HTML report, step/workflow refs and metadata, OWNERS files, and wiring into the edge-tooling CI config with Slack reporting. ChangesLVMS CI Doctor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.sh`:
- Line 3: The script currently enables global shell tracing with set -x which
causes sensitive credential operations (the block that executes the credentialed
Claude invocation / commands that use mounted credentials) to be logged; wrap
that sensitive section by inserting set +x immediately before the credentialed
Claude execution block and restore tracing with set -x immediately after so
tracing is disabled only for the credential-handling commands and re-enabled for
the rest of the script.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: a380b29a-5a72-4282-9905-452be692b28d
⛔ Files ignored due to path filters (1)
ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-periodics.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (7)
ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yamlci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/OWNERSci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.shci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.metadata.jsonci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.yamlci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.metadata.jsonci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.yaml
| @@ -0,0 +1,109 @@ | |||
| #!/bin/bash | |||
| set -euo pipefail | |||
| set -x | |||
There was a problem hiding this comment.
Disable xtrace during credentialed Claude execution.
Line 3 enables tracing globally, so Line 102 execution runs under set -x while using mounted credentials. Temporarily disable tracing for sensitive sections, then restore it.
Proposed fix
set -euo pipefail
set -x
@@
echo "Running Claude to analyze LVMS CI jobs..."
+set +x
timeout 3000 claude \
--model "${CLAUDE_MODEL}" \
--max-turns 100 \
--output-format stream-json \
--plugin-dir "${PLUGIN_DIR}" \
-p "/lvms-ci:doctor ${RELEASE_VERSIONS}" \
--verbose 2>&1 | tee "${CLAUDE_ANALYSIS_LOG}"
+set -x
echo "Analysis for LVMS CI jobs completed"As per coding guidelines: "When writing step registry command scripts, protect sensitive information ... Disable shell tracing (set +x) temporarily when handling sensitive operations in step registry scripts, then restore it after."
Also applies to: 102-108
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.sh`
at line 3, The script currently enables global shell tracing with set -x which
causes sensitive credential operations (the block that executes the credentialed
Claude invocation / commands that use mounted credentials) to be logged; wrap
that sensitive section by inserting set +x immediately before the credentialed
Claude execution block and restore tracing with set -x immediately after so
tracing is disabled only for the credential-handling commands and re-enabled for
the rest of the script.
There was a problem hiding this comment.
The script doesn't handle any sensitive credentials. There are no mounted secrets, no API tokens read from files, and no credential variables being set. The claude command on line 102 uses
--model (a model name like "sonnet") and --plugin-dir (a filesystem path) — neither is sensitive. The Claude CLI authenticates via its own internal mechanism (baked into the container
image), not via environment variables or mounted secrets that would leak in trace output.
The set -x tracing here only logs filesystem paths, non-secret environment variables (EDGE_TOOLING_DIR, RELEASE_VERSIONS, CLAUDE_MODEL), and standard shell operations. There's nothing to
protect.
baaa715 to
d080042
Compare
|
@kasturinarra: This pull request references OCPEDGE-2687 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-lvms-ci-doctor |
|
@kasturinarra: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
d080042 to
149eacd
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-commands.sh`:
- Line 3: Remove the global tracing flag and make the script use safe defaults:
replace the top-level "set -x" with "set -euo pipefail" (i.e., ensure the script
initializes strict error handling) and, if debugging is needed, enable xtrace
only around the specific short command sections by wrapping those commands with
a temporary "set -x" / "set +x" pair; target the line containing "set -x" and
any debug blocks in this script (the top-level invocation) to implement the
change.
In
`@ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.yaml`:
- Line 13: The post-step timeout is too short (currently "timeout: 1m0s") and
can kill the pod before the curl download (which uses --max-time 300) completes;
update the timeout field in openshift-edge-tooling-lvms-ci-post-ref.yaml (the
"timeout: 1m0s" entry) to a value larger than the curl window (for example
"timeout: 6m0s") so the step allows enough time for retries and downloads to
finish.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 810a692e-407e-4a6f-a47b-15e20cdc817f
⛔ Files ignored due to path filters (1)
ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-periodics.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (12)
ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yamlci-operator/step-registry/openshift/edge-tooling/lvms-ci/OWNERSci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/OWNERSci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.shci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.metadata.jsonci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.yamlci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.metadata.jsonci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.yamlci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/OWNERSci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-commands.shci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.metadata.jsonci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.yaml
✅ Files skipped from review due to trivial changes (3)
- ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/OWNERS
- ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.metadata.json
- ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.metadata.json
🚧 Files skipped from review as they are similar to previous changes (7)
- ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.yaml
- ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.yaml
- ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/OWNERS
- ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
- ci-operator/step-registry/openshift/edge-tooling/lvms-ci/OWNERS
- ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.metadata.json
- ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.sh
| @@ -0,0 +1,30 @@ | |||
| #!/bin/bash | |||
| set -euo pipefail | |||
| set -x | |||
There was a problem hiding this comment.
Avoid always-on xtrace in step-registry scripts.
set -x is enabled globally. Keep tracing off by default and only enable it for short debug sections to reduce log leakage risk.
As per coding guidelines: "Step registry command scripts should use set -euo pipefail by default and only enable set -x for debugging, avoiding accidental disclosure of passwords, tokens, API keys, cluster URLs, or kubeconfig contents".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-commands.sh`
at line 3, Remove the global tracing flag and make the script use safe defaults:
replace the top-level "set -x" with "set -euo pipefail" (i.e., ensure the script
initializes strict error handling) and, if debugging is needed, enable xtrace
only around the specific short command sections by wrapping those commands with
a temporary "set -x" / "set +x" pair; target the line containing "set -x" and
any debug blocks in this script (the top-level invocation) to implement the
change.
There was a problem hiding this comment.
There are no secrets in this file. Debugging can be enabled unconditionally.
There was a problem hiding this comment.
@ggiguash, that's a fair point — looking at the script, it only constructs a GCS URL and downloads an HTML report via curl, with no credentials, tokens, or sensitive values in scope. The concern doesn't apply here. Happy to dismiss this.
✏️ Learnings added
Learnt from: ggiguash
Repo: openshift/release PR: 79349
File: ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-commands.sh:3-3
Timestamp: 2026-05-18T09:59:38.396Z
Learning: In the openshift/release repository, when reviewing step-registry command scripts for `set -x` usage, only flag it as a potential issue if the script actually handles secrets, credentials, tokens, API keys, kubeconfig contents, or other sensitive values. Scripts that purely perform non-sensitive operations (e.g., URL construction and downloading public artifacts via curl) can safely use `set -x` unconditionally.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: tthvo
Repo: openshift/release PR: 79214
File: ci-operator/step-registry/upi/install/aws/usgov/private/cluster/upi-install-aws-usgov-private-cluster-commands.sh:834-836
Timestamp: 2026-05-13T04:27:10.115Z
Learning: When reviewing shell scripts that invoke the AWS CLI, do not flag use of `--filter` as incorrect if the intended argument is `--filters` (plural). AWS CLI supports prefix matching for argument names, so `--filter` is an accepted alias for `--filters` for relevant `aws ec2 describe-*` commands (e.g., `describe-availability-zones`, `describe-instances`, `describe-subnets`). Both forms should be treated as equivalent.
Add step registry ref and workflow for the LVMS CI doctor that analyzes LVMS periodic job failures using Claude AI. Runs every Sunday at 5 PM UTC (after weekly qe-integration-tests complete) and reports results to #team-ocp-edge-collab. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
149eacd to
aa5dee7
Compare
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-lvms-ci-doctor |
|
@kasturinarra: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@kasturinarra: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ggiguash, kasturinarra The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/pj-rehearse ack |
|
@kasturinarra: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
Add step registry ref and workflow for the LVMS CI doctor that analyzes LVMS periodic job failures using Claude AI. Runs every Sunday at 5 PM UTC (after weekly qe-integration-tests complete) and reports results to #team-ocp-edge-collab. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
openshift-edge-tooling-lvms-ci-doctor)qe-integration-testsjobs complete (~8:30 AM UTC)#team-ocp-edge-collabSlack channelDetails
edge-tooling-ai-helpersCI image with Claude Code pre-installedlvms-ci:doctorplugin skill from openshift-eng/edge-tooling (merged in PR Fix prow README and image namespace refs #147)mainbranch onlyFiles
ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yamllvms-ci-doctorperiodic test entryci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-periodics.yamlTest plan
🤖 Generated with Claude Code
OpenShift CI Infrastructure: LVMS CI Doctor periodic job (edge-tooling)
This PR adds a weekly, AI-driven diagnostic periodic job for the edge-tooling repository (openshift-eng/edge-tooling) to the openshift/release repository. It introduces a step-registry step, workflow, command scripts, metadata/OWNERS, and the prow periodic configuration so the job runs on main.
Practical impact
#team-ocp-edge-collab.Behavioral notes and limitations
Files / areas affected (high-level)
Overall effect