Skip to content

OCPEDGE-2687: Add LVMS CI Doctor periodic job#79349

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
kasturinarra:lvms-ci-doctor
May 18, 2026
Merged

OCPEDGE-2687: Add LVMS CI Doctor periodic job#79349
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
kasturinarra:lvms-ci-doctor

Conversation

@kasturinarra
Copy link
Copy Markdown
Contributor

@kasturinarra kasturinarra commented May 15, 2026

Summary

  • Adds step registry ref and workflow for the LVMS CI doctor (openshift-edge-tooling-lvms-ci-doctor)
  • Analyzes LVMS periodic job failures using Claude AI, similar to the existing MicroShift CI doctor
  • Scheduled every Sunday at 5 PM UTC, after weekly qe-integration-tests jobs complete (~8:30 AM UTC)
  • Reports results (HTML report link) to #team-ocp-edge-collab Slack channel

Details

  • Uses edge-tooling-ai-helpers CI image with Claude Code pre-installed
  • Invokes the lvms-ci:doctor plugin skill from openshift-eng/edge-tooling (merged in PR Fix prow README and image namespace refs #147)
  • Simplified compared to MicroShift CI doctor: no Jira integration, no GitHub app token, no bug creation, no rebase PR handling
  • Single release target: main branch only

Files

File Description
ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/ Step registry ref, workflow, and commands script
ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml Added lvms-ci-doctor periodic test entry
ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-periodics.yaml Auto-generated periodics config

Test plan

  • CI rehearsal passes
  • Verify step registry metadata is generated correctly
  • First scheduled run on Sunday produces HTML report

🤖 Generated with Claude Code

OpenShift CI Infrastructure: LVMS CI Doctor periodic job (edge-tooling)

This PR adds a weekly, AI-driven diagnostic periodic job for the edge-tooling repository (openshift-eng/edge-tooling) to the openshift/release repository. It introduces a step-registry step, workflow, command scripts, metadata/OWNERS, and the prow periodic configuration so the job runs on main.

Practical impact

  • New Prow periodic: openshift-edge-tooling-lvms-ci-doctor — scheduled weekly (Sundays 17:00 UTC) on main. It analyzes LVMS periodic job failures, produces an HTML report plus JSON/TXT/LOG artifacts, and posts the report link to #team-ocp-edge-collab.
  • Step/workflow: adds a step-ref and workflow under ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/:
    • Step uses ci/edge-tooling-ai-helpers (Claude Code preinstalled), mounts a service-account token for Claude/GCP credentials, sets defaults for Claude/Vertex/GCP, requests 2000m CPU / 4Gi memory, and times out after 1h (10m grace).
    • Workflow runs the doctor step and a post step that retrieves the HTML report for Spyglass.
    • Metadata and OWNERS assign approvers/reviewers to openshift-edge-approvers / openshift-edge-reviewers.
  • Command scripts:
    • openshift-edge-tooling-lvms-ci-doctor-commands.sh: configures a Claude session (settings.json) enabling the lvms-ci:doctor skill, runs claude with streaming JSON output to claude-analysis.log, and on exit collects artifacts, archives Claude projects, signals availability via SHARED_DIR, and enforces a successful Claude "result" event.
    • openshift-edge-tooling-lvms-ci-post-commands.sh: post-step that downloads the generated HTML report from GCS into ARTIFACT_DIR so Spyglass can display it.
  • CI config: registers the lvms-ci-doctor test in ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml and the autogenerated periodic entry appears in ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-periodics.yaml (prowgen target).

Behavioral notes and limitations

  • Relies on the edge-tooling-ai-helpers image (Claude Code) and the lvms-ci:doctor plugin from openshift-eng/edge-tooling (plugin merged separately).
  • Simpler than MicroShift CI doctor: no Jira integration, no GitHub app token usage, no automated bug creation or rebase PR handling; targets main only.
  • Test plan: CI rehearsal must pass, step-registry metadata generation must succeed, and the first scheduled run should produce the HTML report artifact.

Files / areas affected (high-level)

  • New step-registry refs, workflow, command scripts, metadata, and OWNERS under ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/ (and related post/OWNERS).
  • ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml updated to include the lvms-ci-doctor test.
  • Autogenerated prow periodic entry added under ci-operator/jobs/.../openshift-eng-edge-tooling-main-periodics.yaml.

Overall effect

  • Adds a scheduled AI-driven LVMS periodic-job analyzer for edge-tooling that generates an HTML diagnostic report, archives artifacts for Spyglass, and notifies the edge Slack channel.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5a1586fd-f8f8-47ea-bf20-21e424478187

📥 Commits

Reviewing files that changed from the base of the PR and between 149eacd and aa5dee7.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (12)
  • ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/OWNERS
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/OWNERS
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.sh
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/OWNERS
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-commands.sh
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.yaml
✅ Files skipped from review due to trivial changes (8)
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/OWNERS
  • ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/OWNERS
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/OWNERS
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.metadata.json
🚧 Files skipped from review as they are similar to previous changes (4)
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-commands.sh
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.sh

Walkthrough

Adds a weekly LVMS "CI Doctor" workflow: a doctor step that runs Claude and writes artifacts, a post step to fetch the HTML report, step/workflow refs and metadata, OWNERS files, and wiring into the edge-tooling CI config with Slack reporting.

Changes

LVMS CI Doctor

Layer / File(s) Summary
Doctor step implementation
ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.sh
Bash runner: prepares workdir/CLAUDE_HOME, installs an exit trap to copy/archive artifacts, writes Claude settings/permissions, runs claude with timeout/streaming JSON, and enforces success by parsing the analysis log.
Step ref, metadata, and OWNERS
ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.yaml, ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.metadata.json, ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/OWNERS, ci-operator/step-registry/openshift/edge-tooling/lvms-ci/OWNERS
Defines the doctor step (image, command, credential mount, env defaults, resources, timeout), populates metadata.json, and adds OWNERS for approvers/reviewers.
Workflow wiring and metadata
ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.yaml, ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.metadata.json
Adds the openshift-edge-tooling-lvms-ci-doctor workflow that runs the doctor ref in test and the post ref in post, with workflow metadata.
Post step: download/report summary
ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-commands.sh, ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.yaml, ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.metadata.json, ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/OWNERS
Post-step script downloads LVMS CI Doctor HTML report from GCS into ARTIFACT_DIR when ${SHARED_DIR}/claude-report-available exists; includes ref, metadata, and OWNERS.
Main CI job configuration
ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
Adds the lvms-ci-doctor job to tests with a weekly cron, Slack reporter config for #team-ocp-edge-collab, an LVMS report template linking the report artifact and CI logs, and references the workflow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

lgtm, rehearsals-ack

Suggested reviewers

  • fonta-rh
🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main change: adding a new LVMS CI Doctor periodic job, with proper reference to the Jira ticket (OCPEDGE-2687).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Not applicable. PR contains no Ginkgo test files or test definitions—only CI configuration (YAML), operational scripts (Bash), and metadata files. No test titles to validate.
Test Structure And Quality ✅ Passed Custom check for Ginkgo test code quality is not applicable. PR contains only CI/CD configuration (YAML), shell scripts, OWNERS, and metadata files—no Go test files or Ginkgo test code.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are being added in this PR. All changes are CI infrastructure files (YAML configs, JSON metadata, bash scripts, OWNERS files). The custom check does not apply.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds no Ginkgo e2e tests. Changes are CI/CD infrastructure only: YAML workflows, JSON metadata, Bash scripts, and OWNERS files. SNO compatibility check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds CI/CD infrastructure only (step registry refs, workflows, periodic job config). No Kubernetes deployment manifests, operator code, or controllers with scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed PR adds CI infrastructure (shells scripts, YAML, JSON metadata). No Go files or OTE binaries. Check is inapplicable to non-Go CI configuration.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds CI/CD infrastructure only (YAML configs, bash scripts, metadata), not Ginkgo e2e tests. Custom check applies only to Ginkgo e2e tests; not applicable here.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from fonta-rh and fracappa May 15, 2026 17:02
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.sh`:
- Line 3: The script currently enables global shell tracing with set -x which
causes sensitive credential operations (the block that executes the credentialed
Claude invocation / commands that use mounted credentials) to be logged; wrap
that sensitive section by inserting set +x immediately before the credentialed
Claude execution block and restore tracing with set -x immediately after so
tracing is disabled only for the credential-handling commands and re-enabled for
the rest of the script.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a380b29a-5a72-4282-9905-452be692b28d

📥 Commits

Reviewing files that changed from the base of the PR and between c3f24c9 and 1550e2e.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (7)
  • ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/OWNERS
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.sh
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.yaml

@@ -0,0 +1,109 @@
#!/bin/bash
set -euo pipefail
set -x
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Disable xtrace during credentialed Claude execution.

Line 3 enables tracing globally, so Line 102 execution runs under set -x while using mounted credentials. Temporarily disable tracing for sensitive sections, then restore it.

Proposed fix
 set -euo pipefail
 set -x
@@
 echo "Running Claude to analyze LVMS CI jobs..."
+set +x
 timeout 3000 claude \
     --model "${CLAUDE_MODEL}" \
     --max-turns 100 \
     --output-format stream-json \
     --plugin-dir "${PLUGIN_DIR}" \
     -p "/lvms-ci:doctor ${RELEASE_VERSIONS}" \
     --verbose 2>&1 | tee "${CLAUDE_ANALYSIS_LOG}"
+set -x
 echo "Analysis for LVMS CI jobs completed"

As per coding guidelines: "When writing step registry command scripts, protect sensitive information ... Disable shell tracing (set +x) temporarily when handling sensitive operations in step registry scripts, then restore it after."

Also applies to: 102-108

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.sh`
at line 3, The script currently enables global shell tracing with set -x which
causes sensitive credential operations (the block that executes the credentialed
Claude invocation / commands that use mounted credentials) to be logged; wrap
that sensitive section by inserting set +x immediately before the credentialed
Claude execution block and restore tracing with set -x immediately after so
tracing is disabled only for the credential-handling commands and re-enabled for
the rest of the script.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script doesn't handle any sensitive credentials. There are no mounted secrets, no API tokens read from files, and no credential variables being set. The claude command on line 102 uses
--model (a model name like "sonnet") and --plugin-dir (a filesystem path) — neither is sensitive. The Claude CLI authenticates via its own internal mechanism (baked into the container
image), not via environment variables or mounted secrets that would leak in trace output.

The set -x tracing here only logs filesystem paths, non-secret environment variables (EDGE_TOOLING_DIR, RELEASE_VERSIONS, CLAUDE_MODEL), and standard shell operations. There's nothing to
protect.

@kasturinarra kasturinarra force-pushed the lvms-ci-doctor branch 2 times, most recently from baaa715 to d080042 Compare May 18, 2026 07:55
@kasturinarra kasturinarra changed the title Add LVMS CI Doctor periodic job OCPEDGE-2687: Add LVMS CI Doctor periodic job May 18, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 18, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented May 18, 2026

@kasturinarra: This pull request references OCPEDGE-2687 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Adds step registry ref and workflow for the LVMS CI doctor (openshift-edge-tooling-lvms-ci-doctor)
  • Analyzes LVMS periodic job failures using Claude AI, similar to the existing MicroShift CI doctor
  • Scheduled every Sunday at 5 PM UTC, after weekly qe-integration-tests jobs complete (~8:30 AM UTC)
  • Reports results (HTML report link) to #team-ocp-edge-collab Slack channel

Details

  • Uses edge-tooling-ai-helpers CI image with Claude Code pre-installed
  • Invokes the lvms-ci:doctor plugin skill from openshift-eng/edge-tooling (merged in PR Fix prow README and image namespace refs #147)
  • Simplified compared to MicroShift CI doctor: no Jira integration, no GitHub app token, no bug creation, no rebase PR handling
  • Single release target: main branch only

Files

File Description
ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/ Step registry ref, workflow, and commands script
ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml Added lvms-ci-doctor periodic test entry
ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-periodics.yaml Auto-generated periodics config

Test plan

  • CI rehearsal passes
  • Verify step registry metadata is generated correctly
  • First scheduled run on Sunday produces HTML report

🤖 Generated with Claude Code

OpenShift CI Infrastructure: LVMS CI Doctor Periodic Job

This PR adds a weekly, AI-driven diagnostic periodic job to the OpenShift CI configuration for the edge-tooling component (openshift-eng/edge-tooling) in the openshift/release repository. It introduces a new step-registry reference, a workflow, a command script, ownership metadata, and wires a periodic Prow job that runs against main.

What changed (practical impact)

  • New periodic Prow job: schedules a weekly run (Sundays 17:00 UTC) on main that executes the LVMS CI doctor and posts results (HTML report link and CI logs) to the #team-ocp-edge-collab Slack channel.
  • Step registry & workflow: new step and workflow under ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/:
  • A step ref that uses the ci/edge-tooling-ai-helpers image, runs the included command script, mounts a service-account token for Claude/Google credentials, sets Claude/GCP-related env defaults (Vertex toggle, region, project, GOOGLE_APPLICATION_CREDENTIALS, CLAUDE model, RELEASE_VERSIONS), requests 2000m CPU / 4Gi memory, and sets a 1h timeout with 10m grace.
  • A single-step workflow invoking that ref.
  • Metadata and OWNERS files assigning approvers/reviewers to openshift-edge-approvers / openshift-edge-reviewers.
  • Command script: adds openshift-edge-tooling-lvms-ci-doctor-commands.sh which:
  • Prepares a Claude session (settings.json) granting necessary file/plugin permissions and enabling the lvms-ci:doctor skill.
  • Runs claude with a long timeout and streaming JSON output to a claude-analysis.log, invoking the lvms-ci:doctor prompt against the edge-tooling source.
  • On exit, collects HTML/JSON/TXT/LOG artifacts into ARTIFACT_DIR, archives Claude session projects, and validates a successful Claude result event before finishing.
  • CI configuration: adds the lvms-ci-doctor test entry to ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml and the autogenerated periodic job entry in ci-operator/jobs/.../openshift-eng-edge-tooling-main-periodics.yaml so prowgen will create the periodic.

Behavioral notes / limitations

  • The job uses the edge-tooling-ai-helpers image (Claude Code preinstalled) and the lvms-ci:doctor plugin from openshift-eng/edge-tooling to analyze LVMS periodic job failures and produce an HTML report.
  • This doctor is intentionally simpler than the MicroShift CI doctor: no Jira integration, no GitHub app token usage, no automatic bug creation, and no PR rebase handling; it targets only the main branch.
  • Tests required: CI rehearsal must pass, step-registry metadata generation must validate, and the first scheduled run should produce the HTML report artifact.

Files added/modified (high-level)

  • New step-registry files, workflow, script, metadata and OWNERS under ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/
  • ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml updated to include the lvms-ci-doctor test
  • Auto-generated periodic job entry added in ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-periodics.yaml

Overall effect

  • Extends OpenShift CI diagnostics for the edge-tooling area by adding a scheduled, AI-driven LVMS periodic-job analyzer that uploads an HTML report to CI artifacts and notifies the edge Slack channel.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@kasturinarra
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-lvms-ci-doctor

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@kasturinarra: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-commands.sh`:
- Line 3: Remove the global tracing flag and make the script use safe defaults:
replace the top-level "set -x" with "set -euo pipefail" (i.e., ensure the script
initializes strict error handling) and, if debugging is needed, enable xtrace
only around the specific short command sections by wrapping those commands with
a temporary "set -x" / "set +x" pair; target the line containing "set -x" and
any debug blocks in this script (the top-level invocation) to implement the
change.

In
`@ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.yaml`:
- Line 13: The post-step timeout is too short (currently "timeout: 1m0s") and
can kill the pod before the curl download (which uses --max-time 300) completes;
update the timeout field in openshift-edge-tooling-lvms-ci-post-ref.yaml (the
"timeout: 1m0s" entry) to a value larger than the curl window (for example
"timeout: 6m0s") so the step allows enough time for retries and downloads to
finish.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 810a692e-407e-4a6f-a47b-15e20cdc817f

📥 Commits

Reviewing files that changed from the base of the PR and between d080042 and 149eacd.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (12)
  • ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/OWNERS
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/OWNERS
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.sh
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/OWNERS
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-commands.sh
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.yaml
✅ Files skipped from review due to trivial changes (3)
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/OWNERS
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-ref.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.metadata.json
🚧 Files skipped from review as they are similar to previous changes (7)
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-workflow.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/OWNERS
  • ci-operator/config/openshift-eng/edge-tooling/openshift-eng-edge-tooling-main.yaml
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/OWNERS
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-ref.metadata.json
  • ci-operator/step-registry/openshift/edge-tooling/lvms-ci/doctor/openshift-edge-tooling-lvms-ci-doctor-commands.sh

@@ -0,0 +1,30 @@
#!/bin/bash
set -euo pipefail
set -x
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Avoid always-on xtrace in step-registry scripts.

set -x is enabled globally. Keep tracing off by default and only enable it for short debug sections to reduce log leakage risk.

As per coding guidelines: "Step registry command scripts should use set -euo pipefail by default and only enable set -x for debugging, avoiding accidental disclosure of passwords, tokens, API keys, cluster URLs, or kubeconfig contents".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-commands.sh`
at line 3, Remove the global tracing flag and make the script use safe defaults:
replace the top-level "set -x" with "set -euo pipefail" (i.e., ensure the script
initializes strict error handling) and, if debugging is needed, enable xtrace
only around the specific short command sections by wrapping those commands with
a temporary "set -x" / "set +x" pair; target the line containing "set -x" and
any debug blocks in this script (the top-level invocation) to implement the
change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no secrets in this file. Debugging can be enabled unconditionally.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggiguash, that's a fair point — looking at the script, it only constructs a GCS URL and downloads an HTML report via curl, with no credentials, tokens, or sensitive values in scope. The concern doesn't apply here. Happy to dismiss this.


✏️ Learnings added
Learnt from: ggiguash
Repo: openshift/release PR: 79349
File: ci-operator/step-registry/openshift/edge-tooling/lvms-ci/post/openshift-edge-tooling-lvms-ci-post-commands.sh:3-3
Timestamp: 2026-05-18T09:59:38.396Z
Learning: In the openshift/release repository, when reviewing step-registry command scripts for `set -x` usage, only flag it as a potential issue if the script actually handles secrets, credentials, tokens, API keys, kubeconfig contents, or other sensitive values. Scripts that purely perform non-sensitive operations (e.g., URL construction and downloading public artifacts via curl) can safely use `set -x` unconditionally.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: tthvo
Repo: openshift/release PR: 79214
File: ci-operator/step-registry/upi/install/aws/usgov/private/cluster/upi-install-aws-usgov-private-cluster-commands.sh:834-836
Timestamp: 2026-05-13T04:27:10.115Z
Learning: When reviewing shell scripts that invoke the AWS CLI, do not flag use of `--filter` as incorrect if the intended argument is `--filters` (plural). AWS CLI supports prefix matching for argument names, so `--filter` is an accepted alias for `--filters` for relevant `aws ec2 describe-*` commands (e.g., `describe-availability-zones`, `describe-instances`, `describe-subnets`). Both forms should be treated as equivalent.

Add step registry ref and workflow for the LVMS CI doctor that
analyzes LVMS periodic job failures using Claude AI. Runs every
Sunday at 5 PM UTC (after weekly qe-integration-tests complete)
and reports results to #team-ocp-edge-collab.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@kasturinarra: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-eng-edge-tooling-main-lvms-ci-doctor N/A periodic Periodic changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@kasturinarra
Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-eng-edge-tooling-main-lvms-ci-doctor

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@kasturinarra: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

@kasturinarra: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ggiguash
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 18, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 18, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ggiguash, kasturinarra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kasturinarra
Copy link
Copy Markdown
Contributor Author

/pj-rehearse ack

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@kasturinarra: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot openshift-merge-bot Bot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label May 18, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit 9f3b594 into openshift:main May 18, 2026
19 checks passed
wgahnagl pushed a commit to wgahnagl/release that referenced this pull request May 20, 2026
Add step registry ref and workflow for the LVMS CI doctor that
analyzes LVMS periodic job failures using Claude AI. Runs every
Sunday at 5 PM UTC (after weekly qe-integration-tests complete)
and reports results to #team-ocp-edge-collab.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants