Skip to content

SREP-4870: Fix rosa-e2e-test script crash on Classic STS clusters#79350

Open
dustman9000 wants to merge 1 commit into
openshift:mainfrom
dustman9000:fix/classic-sts-test-script
Open

SREP-4870: Fix rosa-e2e-test script crash on Classic STS clusters#79350
dustman9000 wants to merge 1 commit into
openshift:mainfrom
dustman9000:fix/classic-sts-test-script

Conversation

@dustman9000
Copy link
Copy Markdown
Member

@dustman9000 dustman9000 commented May 15, 2026

Summary

  • The rosa-e2e-test step script crashes on Classic STS clusters because ocm get .../hypershift returns non-zero, which under set -o errexit kills the entire script before reaching the test execution block
  • Adds || true to the hypershift endpoint pipeline so MC_NAME is set to empty for Classic STS, allowing the script to proceed to tests

Test plan

  • Verify Classic STS nightly jobs no longer crash at the MC access step
  • Verify HCP nightly jobs still correctly detect and use management cluster access

Jira: https://redhat.atlassian.net/browse/SREP-4870

Fixed rosa-e2e-test script crash on Classic STS clusters

The rosa-e2e-test CI step script now gracefully handles Classic STS clusters that don't support the hypershift management cluster API endpoint.

The Problem: The script uses set -o errexit to exit on any command failure. When testing Classic STS clusters, the ocm get .../hypershift API call returns a non-zero exit status, causing the entire script to crash before reaching the test execution block.

The Fix: Added || true to the management cluster discovery pipeline (line 50 of rosa-e2e-test-commands.sh). This allows the command to fail gracefully—when the hypershift endpoint is unavailable, MC_NAME is set to an empty value and the script continues to the "no management cluster found" path rather than crashing.

The change is minimal and surgical: it tolerates the expected failure on Classic STS clusters while preserving the correct detection and use of management cluster access for HCP (Hosted Control Plane) nightly jobs, which do support the hypershift endpoint.

The hypershift endpoint returns an error for Classic STS clusters,
which under set -o errexit kills the entire script. Add || true so
MC_NAME gets set to empty and the script continues to the test
execution block.

Jira: https://redhat.atlassian.net/browse/SREP-4870
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented May 15, 2026

@dustman9000: This pull request references SREP-4870 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • The rosa-e2e-test step script crashes on Classic STS clusters because ocm get .../hypershift returns non-zero, which under set -o errexit kills the entire script before reaching the test execution block
  • Adds || true to the hypershift endpoint pipeline so MC_NAME is set to empty for Classic STS, allowing the script to proceed to tests

Test plan

  • Verify Classic STS nightly jobs no longer crash at the MC access step
  • Verify HCP nightly jobs still correctly detect and use management cluster access

Jira: https://redhat.atlassian.net/browse/SREP-4870

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 4e595ee6-c974-48a5-8aff-e4a85fba3c0e

📥 Commits

Reviewing files that changed from the base of the PR and between c3f24c9 and 39a676d.

📒 Files selected for processing (1)
  • ci-operator/step-registry/rosa/e2e/test/rosa-e2e-test-commands.sh

Walkthrough

The ROSA e2e test script's management cluster discovery now tolerates OCM API and JSON extraction failures. The MC_NAME variable assignment adds || true to prevent script exit under set -e, allowing empty evaluation and graceful fallback instead of aborting.

Changes

Management Cluster Discovery Resilience

Layer / File(s) Summary
OCM hypershift cluster endpoint error tolerance
ci-operator/step-registry/rosa/e2e/test/rosa-e2e-test-commands.sh
The MC_NAME lookup adds || true to tolerate OCM API call or jq failures, allowing the script to proceed to the "no management cluster found" path instead of exiting when set -e is enabled.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested labels

lgtm

Suggested reviewers

  • bmeng
  • neisw
🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main fix: addressing a crash in the rosa-e2e-test script that occurs specifically on Classic STS clusters due to the hypershift endpoint lookup failure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR modifies a shell script that orchestrates test execution, not Ginkgo test code. No test names are defined or changed, so this check is not applicable.
Test Structure And Quality ✅ Passed The custom check for Ginkgo test quality is not applicable. The PR modifies a shell script (rosa-e2e-test-commands.sh), not Ginkgo test code. No Go test files were changed.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. The change is shell script infrastructure only (rosa-e2e-test-commands.sh). The MicroShift compatibility check applies only to new Ginkgo test definitions.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR does not add or modify any Ginkgo e2e tests. The changes are to CI/CD shell scripts and configuration files only. The SNO test compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed The PR modifies only a CI test script (rosa-e2e-test-commands.sh), not deployment manifests, operator code, or controllers. The topology-aware scheduling check does not apply.
Ote Binary Stdout Contract ✅ Passed The check targets Go OTE binaries. This PR only modifies a shell script, which is not subject to OTE binary stdout contracts.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Check applies only to new Ginkgo e2e tests. This PR modifies a bash shell script that configures test execution, not a test file. No new tests added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from ravitri and tiwillia May 15, 2026 17:05
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dustman9000

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@dustman9000: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-online-rosa-e2e-main-e2e-rosa-classic-smoke openshift-online/rosa-e2e presubmit Registry content changed
pull-ci-openshift-online-rosa-e2e-main-e2e-rosa-hcp-smoke openshift-online/rosa-e2e presubmit Registry content changed
periodic-ci-openshift-online-rosa-e2e-main-periodics-rosa-classic-sts-e2e-stable-4-20 N/A periodic Registry content changed
periodic-ci-openshift-online-rosa-e2e-main-periodics-rosa-hcp-e2e-stable-4-20 N/A periodic Registry content changed
periodic-ci-openshift-online-rosa-e2e-main-periodics-rosa-hcp-e2e-candidate-4-22 N/A periodic Registry content changed
periodic-ci-openshift-online-rosa-e2e-main-periodics-rosa-hcp-e2e-stable-4-21 N/A periodic Registry content changed
periodic-ci-openshift-online-rosa-e2e-main-periodics-rosa-hcp-e2e-nightly-5-0 N/A periodic Registry content changed
periodic-ci-openshift-online-rosa-e2e-main-periodics-rosa-classic-sts-e2e-candidate-4-22 N/A periodic Registry content changed
periodic-ci-openshift-online-rosa-e2e-main-periodics-rosa-classic-sts-e2e-stable-4-19 N/A periodic Registry content changed
periodic-ci-openshift-online-rosa-e2e-main-periodics-rosa-hcp-e2e-stable-4-19 N/A periodic Registry content changed
periodic-ci-openshift-online-rosa-e2e-main-periodics-rosa-classic-sts-e2e-stable-4-21 N/A periodic Registry content changed

Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@dustman9000
Copy link
Copy Markdown
Member Author

/pj-rehearse periodic-ci-openshift-online-rosa-e2e-main-periodics-rosa-classic-sts-e2e-stable-4-19

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@dustman9000: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

@dustman9000: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants