refactor(ci): encapsulate test run tracking into lib/test-run-tracker.sh by gustavolira · Pull Request #4366 · redhat-developer/rhdh

gustavolira · 2026-03-04T19:46:45Z

Summary

Introduces .ci/pipelines/lib/test-run-tracker.sh module that encapsulates the CURRENT_DEPLOYMENT global counter into a clean API (test_run_tracker::register, test_run_tracker::mark_deploy_success, test_run_tracker::mark_deploy_failed, test_run_tracker::mark_test_result)
Replaces manual counter manipulation in utils.sh and lib/testing.sh with single-call functions, reducing error-prone boilerplate
Removes the CURRENT_DEPLOYMENT export from reporting.sh — state is now managed internally by the test run tracker module

Test plan

Verify grep -rn "CURRENT_DEPLOYMENT" .ci/ returns zero matches outside lib/test-run-tracker.sh
Run shellcheck on modified files (lib/test-run-tracker.sh, lib/testing.sh, reporting.sh, utils.sh)
Validate source chain: reporting.sh → lib/test-run-tracker.sh loads correctly
Run CI pipeline to confirm functional equivalence (counter increments identically)

🤖 Generated with Claude Code

github-actions · 2026-03-04T19:49:18Z

The container image build workflow finished with status: cancelled.

github-actions · 2026-03-04T20:10:02Z

The container image build workflow finished with status: cancelled.

github-actions · 2026-03-04T20:14:00Z

The container image build workflow finished with status: cancelled.

gustavolira · 2026-03-04T20:38:45Z

/review

rhdh-qodo-merge · 2026-03-04T20:39:20Z

PR Reviewer Guide 🔍

(Review updated until commit `fdca364`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🔒 Security concerns Supply chain / remote script execution: `.ci/pipelines/install-methods/operator.sh` downloads a script from GitHub based on `${RELEASE_BRANCH_NAME}` and makes it executable. If that file is later executed (likely), this introduces a risk of running unpinned remote code. Consider pinning to a specific commit/tag, validating `${RELEASE_BRANCH_NAME}` against an allowlist, and/or verifying integrity (checksum/signature) before execution.
⚡ Recommended focus areas for review State Consistency The deployment counter is intentionally not exported, while the functions are exported for subshell usage. Please validate that no call sites invoke `deployment::` in a subshell (e.g., `timeout bash -c ...`, pipes, command substitutions) in a way that increments `_DEPLOYMENT_COUNTER` outside the parent shell, since this can desynchronize IDs from the parent process and lead to statuses being written under unexpected deployment indices. # Internal state _DEPLOYMENT_COUNTER=0 deployment::next_id() { _DEPLOYMENT_COUNTER=$((_DEPLOYMENT_COUNTER + 1)) echo "${_DEPLOYMENT_COUNTER}" } deployment::current_id() { echo "${_DEPLOYMENT_COUNTER}" } deployment::register() { local label="$1" deployment::next_id > /dev/null save_status_deployment_namespace "${_DEPLOYMENT_COUNTER}" "$label" } deployment::mark_deploy_success() { save_status_failed_to_deploy "${_DEPLOYMENT_COUNTER}" false } deployment::mark_deploy_failed() { local label="$1" deployment::register "$label" save_status_failed_to_deploy "${_DEPLOYMENT_COUNTER}" true save_status_test_failed "${_DEPLOYMENT_COUNTER}" true save_overall_result 1 } deployment::mark_test_result() { local passed="$1" local num_failures="${2:-}" if [[ "$passed" == "true" ]]; then save_status_test_failed "${_DEPLOYMENT_COUNTER}" false else save_status_test_failed "${_DEPLOYMENT_COUNTER}" true fi if [[ -n "$num_failures" ]]; then save_status_number_of_test_failed "${_DEPLOYMENT_COUNTER}" "$num_failures" fi } # Export all functions for subshell compatibility. # Note: _DEPLOYMENT_COUNTER is NOT exported because subshells inherit only # the snapshot at fork time — counter updates in the parent would not propagate. export -f deployment::next_id export -f deployment::current_id export -f deployment::register export -f deployment::mark_deploy_success export -f deployment::mark_deploy_failed export -f deployment::mark_test_result Double Registration* `deployment::mark_deploy_failed` calls `deployment::register`, which always increments the counter. Confirm that failure paths never call `deployment::mark_deploy_failed` after an earlier `deployment::register` for the same logical deployment, otherwise you’ll get an extra deployment entry (and the failure will be attributed to the next ID rather than the already-registered one). deployment::register() { local label="$1" deployment::next_id > /dev/null save_status_deployment_namespace "${_DEPLOYMENT_COUNTER}" "$label" } deployment::mark_deploy_success() { save_status_failed_to_deploy "${_DEPLOYMENT_COUNTER}" false } deployment::mark_deploy_failed() { local label="$1" deployment::register "$label" save_status_failed_to_deploy "${_DEPLOYMENT_COUNTER}" true save_status_test_failed "${_DEPLOYMENT_COUNTER}" true save_overall_result 1 }
📚 Focus areas based on broader codebase context Exit Code Failure paths now call `deployment::mark_deploy_failed ...` but then return `0` to the caller (e.g., when Backstage is not running or a Helm upgrade times out). Validate that the top-level CI entrypoints still reliably fail the job (non-zero exit) when a deployment/test fails, rather than only recording `OVERALL_RESULT`. (Ref 1, Ref 5) else echo "Backstage is not running. Marking deployment as failed and continuing..." deployment::mark_deploy_failed "$artifacts_subdir" save_all_pod_logs "$namespace" return 0 fi # Collect pod logs only on test failure to speed up successful PR runs. local _current_id _current_id="$(deployment::current_id)" if [[ "${STATUS_TEST_FAILED[$_current_id]:-}" == "true" ]]; then save_all_pod_logs "$namespace" else log::info "Tests passed — skipping pod log collection for namespace: ${namespace}" fi return 0 } # ============================================================================== # Upgrade Verification # ============================================================================== # Check Helm upgrade rollout status # Args: # $1 - deployment_name: The name of the deployment # $2 - namespace: The namespace where the deployment is located # $3 - timeout: Timeout in seconds (default: 600) # Returns: # 0 - Upgrade completed successfully # 1 - Upgrade failed or timed out testing::check_helm_upgrade() { local deployment_name="$1" local namespace="$2" local timeout="${3:-600}" if [[ -z "$deployment_name" \|\| -z "$namespace" ]]; then log::error "${_TESTING_ERR_MISSING_PARAMS}" log::info "Usage: testing::check_helm_upgrade <deployment_name> <namespace> [timeout]" return 1 fi log::info "Checking rollout status for deployment: ${deployment_name} in namespace: ${namespace}..." if oc rollout status "deployment/${deployment_name}" -n "${namespace}" --timeout="${timeout}s" -w; then log::info "RHDH upgrade is complete." return 0 else log::error "RHDH upgrade encountered an issue or timed out." return 1 fi } # Check upgrade and run tests if successful # Args: # $1 - deployment_name: The name of the deployment # $2 - release_name: The Helm release name # $3 - namespace: The namespace # $4 - playwright_project: The Playwright project to run # $5 - url: The URL to test against # $6 - timeout: (optional) Timeout in seconds (default: 600) testing::check_upgrade_and_test() { local deployment_name="$1" local release_name="$2" local namespace="$3" local playwright_project="$4" local url=$5 local timeout=${6:-600} if [[ -z "$deployment_name" \|\| -z "$release_name" \|\| -z "$namespace" \|\| -z "$playwright_project" \|\| -z "$url" ]]; then log::error "${_TESTING_ERR_MISSING_PARAMS}" log::info "Usage: testing::check_upgrade_and_test <deployment_name> <release_name> <namespace> <playwright_project> <url> [timeout]" return 1 fi if testing::check_helm_upgrade "${deployment_name}" "${namespace}" "${timeout}"; then testing::check_and_test "${release_name}" "${namespace}" "${playwright_project}" "${url}" else log::error "Helm upgrade encountered an issue or timed out. Exiting..." deployment::mark_deploy_failed "$namespace" fi return 0 Reference reasoning: Existing e2e runner scripts treat deployment/container failures as hard failures by exiting with the captured non-zero exit code. This indicates the expected behavior is to propagate failure via process exit status, not just via reporting state files.
📄 References redhat-developer/rhdh/e2e-tests/container-init.sh [177-182] redhat-developer/rhdh/e2e-tests/container-init.sh [136-175] redhat-developer/rhdh/e2e-tests/container-init.sh [86-117] redhat-developer/rhdh/e2e-tests/container-init.sh [47-84] redhat-developer/rhdh/e2e-tests/local-run.sh [442-470] redhat-developer/rhdh/e2e-tests/local-run.sh [403-426] redhat-developer/rhdh/e2e-tests/local-run.sh [338-360] redhat-developer/rhdh/e2e-tests/local-run.sh [1-12]

gustavolira · 2026-03-04T20:48:36Z

/retest

github-actions · 2026-03-04T20:53:57Z

Image was built and published successfully. It is available at:

zdrapela · 2026-03-05T09:40:41Z

/test e2e-ocp-helm-nightly

zdrapela

Please make sure that each file that uses functions from other files sources all the files that it depends on - to ensure dependencies.

We should definitely check that for job triggered by /test e2e-ocp-helm-nightly, all the files in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/redhat-developer_rhdh/4366/pull-ci-redhat-developer-rhdh-main-e2e-ocp-helm-nightly/2029492297959215104/artifacts/e2e-ocp-helm-nightly/redhat-developer-rhdh-ocp-helm-nightly/artifacts/reporting/ have the correct number of entries.

.ci/pipelines/cluster/aks/aks-helm-deployment.sh

.ci/pipelines/lib/namespace.sh

.ci/pipelines/cluster/aks/aks-helm-deployment.sh

.ci/pipelines/lib/testing.sh

.ci/pipelines/reporting.sh

.ci/pipelines/lib/deployment.sh

github-actions · 2026-03-06T15:00:17Z

Image was built and published successfully. It is available at:

github-actions · 2026-03-06T15:15:35Z

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

.ci/pipelines/lib/testing.sh

zdrapela · 2026-03-09T14:08:37Z

/review
-i

rhdh-qodo-merge · 2026-03-09T14:09:17Z

Persistent review updated to latest commit fdca364

zdrapela · 2026-03-09T14:10:02Z

/agentic_review

rhdh-qodo-merge · 2026-03-09T14:10:09Z

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. Missing failure-count entry 🐞 Bug ✧ Quality

Description

deployment::mark_test_result only writes STATUS_NUMBER_OF_TEST_FAILED when a non-empty value is
provided; testing::run_tests can pass an empty value if JUnit parsing returns no match, and
deployment-failure paths don’t set a placeholder count. This can lead to missing lines in
STATUS_NUMBER_OF_TEST_FAILED.txt, reducing report completeness and risking misalignment with other
per-deployment status files.

Code

.ci/pipelines/lib/deployment.sh[R42-52]

+deployment::mark_test_result() {
+  local passed="$1"
+  local num_failures="${2:-}"
+  if [[ "$passed" == "true" ]]; then
+    save_status_test_failed "${_DEPLOYMENT_COUNTER}" false
+  else
+    save_status_test_failed "${_DEPLOYMENT_COUNTER}" true
+  fi
+  if [[ -n "$num_failures" ]]; then
+    save_status_number_of_test_failed "${_DEPLOYMENT_COUNTER}" "$num_failures"
+  fi

Evidence

The deployment module conditionally persists failure counts only when num_failures is non-empty,
so any caller passing an empty string results in no line being appended to
STATUS_NUMBER_OF_TEST_FAILED.txt. testing::run_tests populates failed_tests via grep when
the JUnit file exists but does not default it if grep returns an empty string, and then passes it
through to deployment::mark_test_result. The documentation explicitly describes
STATUS_NUMBER_OF_TEST_FAILED.txt as having failure counts “one per line,” implying each deployment
should contribute an entry.

.ci/pipelines/lib/deployment.sh[42-53]
.ci/pipelines/lib/testing.sh[112-124]
.ci/pipelines/reporting.sh[47-53]
docs/e2e-tests/enhanced-ci-reporting.md[72-81]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`STATUS_NUMBER_OF_TEST_FAILED.txt` can miss an entry for a deployment because `deployment::mark_test_result` only persists the count when the value is non-empty, and callers may pass an empty string (e.g., if JUnit parsing yields no match). This reduces reporting completeness and can desynchronize per-deployment status files.

### Issue Context
- `deployment::mark_test_result` gates writing the failure count on `[[ -n &quot;$num_failures&quot; ]]`.
- `testing::run_tests` computes `failed_tests` via `grep` when the JUnit file exists, but does not default it if `grep` yields an empty result.
- Docs describe `STATUS_NUMBER_OF_TEST_FAILED.txt` as &quot;one per line&quot;, implying each deployment should contribute an entry.

### Fix Focus Areas
- .ci/pipelines/lib/deployment.sh[34-53]
- .ci/pipelines/lib/testing.sh[112-124]
- (optional for consistency) .ci/pipelines/lib/deployment.sh[34-40]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

Previous review results

Review updated until commit fdca364

Results up to commit N/A

Looking for bugs?

Come back again in a few minutes. An AI review agent is analysing this pull request

zdrapela · 2026-03-09T14:58:28Z

/agentic_review

rhdh-qodo-merge · 2026-03-09T14:58:37Z

Persistent review updated to latest commit fdca364

zdrapela · 2026-03-10T12:55:07Z

/test e2e-ocp-helm-nightly

github-actions · 2026-03-10T13:24:08Z

Image was built and published successfully. It is available at:

github-actions · 2026-03-11T12:48:02Z

The container image build workflow finished with status: cancelled.

Replace fragile CURRENT_DEPLOYMENT global counter with a dedicated deployment module that provides a clean API (deployment::register, deployment::mark_deploy_success, deployment::mark_deploy_failed, deployment::mark_test_result). This eliminates manual counter manipulation scattered across utils.sh and lib/testing.sh. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove misleading `export _DEPLOYMENT_COUNTER` (subshells can't see parent counter updates anyway) - Source deployment.sh directly in testing.sh so job files that skip utils.sh/reporting.sh still have deployment::* available - Consolidate test result reporting into a single deployment::mark_test_result call instead of mixing old and new APIs - Add deployment::mark_deploy_success in initiate_upgrade_base_deployments to record deploy status consistently Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…eline scripts - Fix silent failures: cert extraction pipeline, helm chart version resolution, curl without -f flag, and http_status comparison - Add error checks to all helm upgrade commands in EKS/GKE/AKS deployments - Add common::require_vars() for envsubst variable validation - Remove dead functions: namespace::remove_finalizers, operator::install_postgres_k8s - Export common::base64_encode for subshell usage - Fix unquoted variables in kubectl commands and sleep Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Move common::require_vars to beginning of deployment functions for early validation - Add common::require_vars to operator deployment files (AKS, EKS, GKE) - Add explicit source for common.sh and namespace.sh in K8s deployment files - Restore namespace::remove_finalizers function (was prematurely removed) - Add source for reporting.sh in deployment.sh (fixes dependency declaration) - Add re-source guard to reporting.sh and fix circular dependency - Use artifacts_subdir (playwright project name) as deployment label instead of namespace in deployment::register to fix RHDHBUGS-2726 - Simplify log collection: collect immediately on deploy failure, check STATUS_TEST_FAILED only after test execution - Remove unnecessary "Uses globals: none" comment - Update docs/e2e-tests/enhanced-ci-reporting.md to reflect deployment:: API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…atus file alignment Use playwright_project instead of namespace as the default deployment label, fixing Slack reports that showed duplicate names (RHDHBUGS-2726). Restructure testing::check_and_test to use testing::run_tests return value directly for pod log collection, simplifying the control flow. Ensure STATUS_NUMBER_OF_TEST_FAILED.txt always has an entry per deployment by defaulting num_failures to "0" and writing "N/A" for deploy failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…function prefix Rename the tracking library from `deployment` to `test_run_tracker` to better reflect its purpose: tracking test run status (deploy outcome + test results) for reporting, not managing deployments. - Rename file: lib/deployment.sh -> lib/test-run-tracker.sh - Rename function prefix: deployment:: -> test_run_tracker:: - Rename internal state: _DEPLOYMENT_COUNTER -> _TEST_RUN_COUNTER - Update all call sites in testing.sh and utils.sh Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…er rename - Update enhanced-ci-reporting.md to use test_run_tracker:: prefix and lib/test-run-tracker.sh file path - Clarify FUTURE MODULE comment in utils.sh to distinguish planned lib/deploy.sh from lib/test-run-tracker.sh Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-11T12:50:36Z

The container image build workflow finished with status: cancelled.

sonarqubecloud · 2026-03-11T12:50:58Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-03-11T13:34:53Z

Image was built and published successfully. It is available at:

zdrapela · 2026-03-11T14:03:35Z

/test e2e-ocp-helm-nightly

zdrapela · 2026-03-11T14:03:40Z

/test e2e-ocp-helm

zdrapela

Thank you! It's now looking great. I'll just wait for the nightly job to finish, and I'll approve.

openshift-ci bot requested review from jrichter1 and psrna March 4, 2026 19:46

rhdh-qodo-merge bot added the Review effort 3/5 label Mar 4, 2026

gustavolira force-pushed the refactor/deployment-tracking branch from ce7a406 to 88355be Compare March 6, 2026 14:20

zdrapela suggested changes Mar 6, 2026

View reviewed changes

openshift-ci bot assigned zdrapela Mar 6, 2026

zdrapela reviewed Mar 6, 2026

View reviewed changes

.ci/pipelines/lib/deployment.sh Outdated Show resolved Hide resolved

zdrapela reviewed Mar 9, 2026

View reviewed changes

.ci/pipelines/lib/testing.sh Show resolved Hide resolved

rhdh-qodo-merge bot added the Possible security concern label Mar 9, 2026

gustavolira force-pushed the refactor/deployment-tracking branch from fdca364 to d522a3d Compare March 10, 2026 12:43

gustavolira changed the title ~~refactor(ci): encapsulate deployment tracking into lib/deployment.sh~~ refactor(ci): encapsulate test run tracking into lib/test-run-tracker.sh Mar 11, 2026

gustavolira and others added 8 commits March 11, 2026 09:49

style(ci): fix prettier formatting in utils.sh

887edbf

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gustavolira force-pushed the refactor/deployment-tracking branch from a4b2c9a to 798a2a0 Compare March 11, 2026 12:50

zdrapela reviewed Mar 11, 2026

View reviewed changes

zdrapela approved these changes Mar 11, 2026

View reviewed changes

openshift-ci bot added the lgtm label Mar 11, 2026

openshift-merge-bot bot merged commit ad49057 into redhat-developer:main Mar 11, 2026
17 checks passed

Conversation

gustavolira commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

gustavolira commented Mar 4, 2026

Uh oh!

rhdh-qodo-merge bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit fdca364)

Uh oh!

gustavolira commented Mar 4, 2026

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

zdrapela commented Mar 5, 2026

Uh oh!

zdrapela left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Uh oh!

zdrapela commented Mar 9, 2026

Uh oh!

rhdh-qodo-merge bot commented Mar 9, 2026

Uh oh!

zdrapela commented Mar 9, 2026

Uh oh!

rhdh-qodo-merge bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Previous review results

Looking for bugs?

Uh oh!

zdrapela commented Mar 9, 2026

Uh oh!

rhdh-qodo-merge bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zdrapela commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

sonarqubecloud bot commented Mar 11, 2026

Quality Gate passed

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

zdrapela commented Mar 11, 2026

Uh oh!

zdrapela commented Mar 11, 2026

Uh oh!

zdrapela left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gustavolira commented Mar 4, 2026 •

edited

Loading

rhdh-qodo-merge bot commented Mar 4, 2026 •

edited

Loading

(Review updated until commit `fdca364`)

rhdh-qodo-merge bot commented Mar 9, 2026 •

edited

Loading

rhdh-qodo-merge bot commented Mar 9, 2026 •

edited

Loading