OCPBUGS-86511: Fix flaky TestAsyncCache backend test#16495
Conversation
Replace timing-dependent sleep loop with require.Eventually and re-grab the reference item after Run() fires its immediate reload. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@jhadvig: This pull request references Jira Issue OCPBUGS-86511, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@jhadvig: This pull request references Jira Issue OCPBUGS-86511, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jhadvig The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@jhadvig: This pull request references Jira Issue OCPBUGS-86511, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughThe TestAsyncCache test in Suggested reviewers
🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/serverutils/asynccache/asyncccache_test.go`:
- Around line 31-33: The test mutates package-level timing variables
initializationRetryInterval and initializationTimeout and does not restore them;
save their current values at the start of the test and register a t.Cleanup that
restores them (use the same variable names) so other tests aren’t affected;
update the test in asyncccache_test.go to capture prevRetry :=
initializationRetryInterval and prevTimeout := initializationTimeout and call
t.Cleanup(func() { initializationRetryInterval = prevRetry;
initializationTimeout = prevTimeout }) immediately after changing them.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 15df2c83-3a18-48e9-ab0c-0b09cd5754cb
📒 Files selected for processing (1)
pkg/serverutils/asynccache/asyncccache_test.go
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
**/*_test.go
📄 CodeRabbit inference engine (Custom checks)
**/*_test.go: When new Ginkgo e2e tests are added, check for IPv4 assumptions: hardcoded IPv4 addresses (10.0.0.1, 192.168.1.1, 172.16.0.0/12), IPv4 localhost (127.0.0.1 without ::1), IPv4-only IP parsing, hardcoded IPv4 CIDRs in Service/Endpoint objects, hardcoded net.ParseIP/ParseCIDR values, IPv4-only pod/node IP assumptions, IPv4-only network policies, and URLs built without brackets for IPv6 (use net.JoinHostPort instead)
When new Ginkgo e2e tests are added, check for external connectivity requirements: connections to public internet hosts, pulling images from public registries without mirrors, downloading from external URLs, DNS resolution of public hostnames, and connections to external APIs/services outside the cluster. If requiring external connectivity that cannot be adapted, add [Skipped:Disconnected] to test name
When new Ginkgo e2e tests are added, check for usage of unavailable MicroShift APIs: Project/ProjectRequest, Build/BuildConfig, DeploymentConfig, ClusterOperator, ClusterVersion, Etcd operator, CSV/OLM resources, MachineSet/Machine/MachineHealthCheck, ClusterAutoscaler, Console, Monitoring stack, ImageRegistry operator, Samples operator, OperatorHub/CatalogSource, CloudCredential, Storage operator, Network operator CRDs, and any OpenShift API except Route and SecurityContextConstraints. Also flag references to namespaces openshift-kube-apiserver, openshift-kube-controller-manager, openshift-kube-scheduler. If flagged and test is intentional, use [Skipped:MicroShift] label or [apigroup:...] tag or guard with exutil.IsMicroShiftCluster() check
OpenShift Tests Extension (OTE) binaries must output valid JSON to stdout at process level. Flag any non-JSON stdout writes in main(), init(), TestMain(), BeforeSuite(), AfterSuite(), SynchronizedBeforeSuite(), RunSpecs() setup, or top-level var/const initializers. Common violations: klog writing to stdout (use klog.SetOutput(os.Stderr) or klog.LogToStderr(true)), fmt.Print/Println/Printf to stdout in main...
Files:
pkg/serverutils/asynccache/asyncccache_test.go
**/*.{yaml,yml,go}
📄 CodeRabbit inference engine (Custom checks)
When deployment manifests, operator code, or controllers are added/modified, check for scheduling constraints assuming standard HA topology (3+ CP nodes, dedicated workers). Flag: required pod anti-affinity with hostname topology key, pod topology spread with DoNotSchedule and hostname key, replica counts derived from node count, nodeSelector/affinity targeting control-plane nodes, broad tolerations that could schedule to resource-constrained arbiter nodes, assuming dedicated worker nodes exist, or PodDisruptionBudgets designed for 3+ nodes. Consider SNO (1 node), DualReplica (2 CP nodes), HighlyAvailableArbiter (2 CP + 1 arbiter), and External (HyperShift) topologies
Files:
pkg/serverutils/asynccache/asyncccache_test.go
**/*.go
📄 CodeRabbit inference engine (STYLEGUIDE.md)
**/*.go: All Go code should be formatted by gofmt
Import statement packages in Go should be separated into 3 groups: stdlib, external dependency, current project
Use typed errors withNewInvalidFlagError,FatalIfFailedpatterns in Go error handling
Use middleware composition, method-based routing, and consistent JSON responses viaserverutils.SendResponsefor HTTP handlers in Go
Apply security headers, CSRF protection, and proper token validation in Go code
Useklogwith appropriate levels (V(4) for debug, Error, Fatal) for logging in Go
Use YAML-based configuration with comprehensive flag validation in Go
Separate K8s client config from client creation, use both typed and dynamic clients in Go
Define clear interfaces in Go for testability and dependency injectionUse Go 1.25 or higher for backend development
Files:
pkg/serverutils/asynccache/asyncccache_test.go
pkg/**/*.go
📄 CodeRabbit inference engine (STYLEGUIDE.md)
Follow domain-based package structure in Go
/pkg/directory (auth, server, proxy, etc.)
Files:
pkg/serverutils/asynccache/asyncccache_test.go
🔇 Additional comments (1)
pkg/serverutils/asynccache/asyncccache_test.go (1)
38-39: LGTM!Also applies to: 41-57, 61-67
| initializationRetryInterval = 5 * time.Millisecond | ||
| initializationTimeout = 10 * time.Millisecond | ||
|
|
There was a problem hiding this comment.
Restore overridden package-level timing vars with t.Cleanup.
This test mutates global knobs and leaves them changed, which can make other tests order-dependent/flaky. Save prior values and restore them in cleanup.
Suggested fix
+ oldInitializationRetryInterval := initializationRetryInterval
+ oldInitializationTimeout := initializationTimeout
initializationRetryInterval = 5 * time.Millisecond
initializationTimeout = 10 * time.Millisecond
+ t.Cleanup(func() {
+ initializationRetryInterval = oldInitializationRetryInterval
+ initializationTimeout = oldInitializationTimeout
+ })📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| initializationRetryInterval = 5 * time.Millisecond | |
| initializationTimeout = 10 * time.Millisecond | |
| oldInitializationRetryInterval := initializationRetryInterval | |
| oldInitializationTimeout := initializationTimeout | |
| initializationRetryInterval = 5 * time.Millisecond | |
| initializationTimeout = 10 * time.Millisecond | |
| t.Cleanup(func() { | |
| initializationRetryInterval = oldInitializationRetryInterval | |
| initializationTimeout = oldInitializationTimeout | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/serverutils/asynccache/asyncccache_test.go` around lines 31 - 33, The
test mutates package-level timing variables initializationRetryInterval and
initializationTimeout and does not restore them; save their current values at
the start of the test and register a t.Cleanup that restores them (use the same
variable names) so other tests aren’t affected; update the test in
asyncccache_test.go to capture prevRetry := initializationRetryInterval and
prevTimeout := initializationTimeout and call t.Cleanup(func() {
initializationRetryInterval = prevRetry; initializationTimeout = prevTimeout })
immediately after changing them.
|
@jhadvig: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest-required |
Analysis / Root cause:
The
TestAsyncCachetest fails intermittently in CI (example failure) withExpected 0 to be greater than 0. Two issues:wait.UntilWithContextfiresrunCacheimmediately whenRun()is called, replacing the cached pointer before the test loop's firstGetItem()— somatchesis always 0 on fast machines.Additionally,
initializationRetryIntervalandinitializationTimeoutwere set after the firstNewAsyncCachecall, having no effect on initialization.Solution description:
NewAsyncCacheso they take effect for the error-case testRun()fires its immediate reloadrequire.Eventuallyto tolerate slow CIrequire.EqualVerified with 50 consecutive runs (
go test -count=50) — all passing.Screenshots / screen recording:
N/A — backend test only.
Test setup:
No special setup required.
Test cases:
go test -v -run TestAsyncCache ./pkg/serverutils/asynccache/passes consistentlygo test -count=50 -run TestAsyncCache ./pkg/serverutils/asynccache/— 50/50 passesBrowser conformance:
N/A — backend test only.
Additional info:
Jira: https://redhat.atlassian.net/browse/OCPBUGS-86511
Summary by CodeRabbit