fix: warm OneSignalDispatchers on init to avoid cold-start ANRs#2645
Merged
Conversation
Contributor
📊 Diff Coverage ReportDiff Coverage Report (Changed Lines Only)Gate: aggregate coverage on changed executable lines must be ≥ 80% (JaCoCo line data for lines touched in the diff). Changed Files Coverage
Overall (aggregate gate)22/26 touched executable lines covered (84.6% — requires ≥ 80%) Per-file detail (informational; gate is aggregate above):
|
abdulraqeeb33
pushed a commit
that referenced
this pull request
May 12, 2026
…ionController, FeatureFlagsRefreshService) behind sdk_background_threading FF
Expand SDK-4506's scope to cover the other two lifecycle handlers that the
ANR-dump analysis from logs/2026-05-12 surfaced as also stalling the main
thread on cold start under `sdk_background_threading`:
* NotificationPermissionController polling lifecycle listener — reads
`_configModelStore.model.foregroundFetchNotificationPermissionInterval`
and calls `pollingWaiter.wake()`, which dispatches a coroutine resume
onto the IO pool. On cold start that hits the dispatcher / executor
lazy chain inside OneSignalDispatchers and the construction cost is
paid on the calling (main) thread.
* FeatureFlagsRefreshService.onFocus / onUnfocused — calls
`OneSignalDispatchers.launchOnIO` directly via `restartForegroundPolling`,
same chain, same stall.
Both move to `runOnSerialIOIfBackgroundThreading` (introduced earlier in
this PR for NotificationsManager.onFocus). Identical rollout shape:
* FF on -> SerialIO single-thread executor, off-main, ordered globally
with BackgroundManager (SDK-4505) and NotificationsManager
(this PR).
* FF off -> inline on the lifecycle main thread (legacy behavior, retains
the ANR for the control cohort).
`FeatureFlagsRefreshService.onUnfocused` now qualifies the `synchronized`
receiver as `synchronized(this@FeatureFlagsRefreshService)` so the lambda
locks on the service instance — the same monitor `restartForegroundPolling`
takes — rather than on the (no-receiver) lambda object.
Tests:
* :core FeatureFlagsRefreshServiceTests asserts onFocus / onUnfocused
route through `runOnSerialIOIfBackgroundThreading` (3 invocations
across start + direct onFocus + direct onUnfocused).
* :notifications NotificationPermissionControllerTests asserts the
polling lifecycle listener's onFocus / onUnfocused both dispatch
through the helper.
These two call sites originally landed in the SDK-4507 PR (#2645) but
share the same root cause and rollout strategy as the
NotificationsManager.onFocus offload, so consolidating them here keeps
the FF rollout matrix one-knob simple and leaves SDK-4507 (#2645) to
focus purely on the prewarm fix.
:OneSignal:core + :OneSignal:notifications detekt + full unit suites
green.
Co-authored-by: Cursor <cursoragent@cursor.com>
7d9d160 to
f72cc05
Compare
abdulraqeeb33
pushed a commit
that referenced
this pull request
May 12, 2026
…ing helper
Introduces the threading infrastructure that the follow-up PRs depend on. This
PR adds the helpers and tests; it does not change any production call sites.
What it adds
* OneSignalDispatchers.SerialIO
A single-thread, named ("OneSignal-SerialIO") CoroutineDispatcher backed
by Executors.newSingleThreadExecutor with a SupervisorJob + CoroutineScope.
Falls back to Dispatchers.IO.limitedParallelism(1) if executor construction
fails. Submission order on the dispatcher == execution order on its single
worker, which is exactly the semantics the focus / unfocus lifecycle
handlers need (see the next PR).
Companion: launchOnSerialIO { ... } and a SerialIO entry in
OneSignalDispatchers.getPerformanceMetrics() / getStatus().
* ThreadUtils.suspendifyOnSerialIO { ... }
Always-on serial dispatch. Wraps OneSignalDispatchers.launchOnSerialIO and
is intentionally NOT gated on ThreadingMode.useBackgroundThreading - some
code paths need ordered off-main execution unconditionally.
* ThreadUtils.runOnSerialIOIfBackgroundThreading { ... }
FF-gated wrapper for non-suspending blocks. When
ThreadingMode.useBackgroundThreading is true the block is dispatched to
SerialIO; when false the block runs inline on the calling thread. This is
the call shape every subsequent focus / unfocus handler in this series
uses, so the rollout matrix stays one-knob simple.
Block is non-suspending on purpose: the FF-off branch executes on the
caller's thread, and a suspending block there would force a runBlocking,
which defeats the purpose of an A/B comparison.
* IOMockHelper stubs the new helpers
suspendifyOnSerialIO + launchOnSerialIO are tracked by awaitIO() so
existing specs stay deterministic. runOnSerialIOIfBackgroundThreading is
stubbed inline-on-test-thread by default so existing call-site specs keep
their observable behavior; specs that want to exercise the FF-on (offload)
branch can override the stub.
Tests
* OneSignalDispatchersTests: new SerialIO cases - construction, lazy chain
activates on first launch, getStatus reports Active + queue size, falls
back to the limitedParallelism(1) path if executor construction fails.
getStatus + getPerformanceMetrics are refactored to extract executorStatus
+ scopeStatus inline helpers to keep them under Detekt's LongMethod /
ComplexMethod thresholds.
* ThreadUtilsFeatureFlagTests: new cases that suspendifyOnSerialIO always
routes through the serial dispatcher (FF-agnostic), and that
runOnSerialIOIfBackgroundThreading routes through the serial dispatcher
when the FF is on and runs inline when the FF is off.
Why a dedicated serial dispatcher (not just suspendifyOnIO)
Multi-thread IO pools don't guarantee submission order = execution order. A
rapid focus burst (activity restart, share flow popping the activity back/
forth) could otherwise interleave cancel/schedule pairs or session-state
mutations. Pinning order-sensitive lifecycle work to a single executor keeps
it globally ordered, and future per-event work (focus counters, session
timing, analytics) inherits the guarantee for free.
:OneSignal:core detekt + full unit suite green. No production behavior change
in this PR; the follow-up PRs land the call-site offloads (#2644) and the
dispatcher prewarm (#2645).
Co-authored-by: Cursor <cursoragent@cursor.com>
520ae2c to
0621414
Compare
f72cc05 to
681c9f3
Compare
0621414 to
6dc8889
Compare
…ing helper
Introduces the threading infrastructure that the follow-up PRs depend on. This
PR adds the helpers and tests; it does not change any production call sites.
What it adds
* OneSignalDispatchers.SerialIO
A single-thread, named ("OneSignal-SerialIO") CoroutineDispatcher backed
by Executors.newSingleThreadExecutor with a SupervisorJob + CoroutineScope.
Falls back to Dispatchers.IO.limitedParallelism(1) if executor construction
fails. Submission order on the dispatcher == execution order on its single
worker, which is exactly the semantics the focus / unfocus lifecycle
handlers need (see the next PR).
Companion: launchOnSerialIO { ... } and a SerialIO entry in
OneSignalDispatchers.getPerformanceMetrics() / getStatus().
* ThreadUtils.suspendifyOnSerialIO { ... }
Always-on serial dispatch. Wraps OneSignalDispatchers.launchOnSerialIO and
is intentionally NOT gated on ThreadingMode.useBackgroundThreading - some
code paths need ordered off-main execution unconditionally.
* ThreadUtils.runOnSerialIOIfBackgroundThreading { ... }
FF-gated wrapper for non-suspending blocks. When
ThreadingMode.useBackgroundThreading is true the block is dispatched to
SerialIO; when false the block runs inline on the calling thread. This is
the call shape every subsequent focus / unfocus handler in this series
uses, so the rollout matrix stays one-knob simple.
Block is non-suspending on purpose: the FF-off branch executes on the
caller's thread, and a suspending block there would force a runBlocking,
which defeats the purpose of an A/B comparison.
* IOMockHelper stubs the new helpers
suspendifyOnSerialIO + launchOnSerialIO are tracked by awaitIO() so
existing specs stay deterministic. runOnSerialIOIfBackgroundThreading is
stubbed inline-on-test-thread by default so existing call-site specs keep
their observable behavior; specs that want to exercise the FF-on (offload)
branch can override the stub.
Tests
* OneSignalDispatchersTests: new SerialIO cases - construction, lazy chain
activates on first launch, getStatus reports Active + queue size, falls
back to the limitedParallelism(1) path if executor construction fails.
getStatus + getPerformanceMetrics are refactored to extract executorStatus
+ scopeStatus inline helpers to keep them under Detekt's LongMethod /
ComplexMethod thresholds.
* ThreadUtilsFeatureFlagTests: new cases that suspendifyOnSerialIO always
routes through the serial dispatcher (FF-agnostic), and that
runOnSerialIOIfBackgroundThreading routes through the serial dispatcher
when the FF is on and runs inline when the FF is off.
Why a dedicated serial dispatcher (not just suspendifyOnIO)
Multi-thread IO pools don't guarantee submission order = execution order. A
rapid focus burst (activity restart, share flow popping the activity back/
forth) could otherwise interleave cancel/schedule pairs or session-state
mutations. Pinning order-sensitive lifecycle work to a single executor keeps
it globally ordered, and future per-event work (focus counters, session
timing, analytics) inherits the guarantee for free.
:OneSignal:core detekt + full unit suite green. No production behavior change
in this PR; the follow-up PRs land the call-site offloads (#2644) and the
dispatcher prewarm (#2645).
Co-authored-by: Cursor <cursoragent@cursor.com>
6dc8889 to
be6f168
Compare
681c9f3 to
71d40b4
Compare
…dk_background_threading FF Wraps every IApplicationLifecycleHandler that does slow / blocking work on the main thread with runOnSerialIOIfBackgroundThreading (introduced in #2643). All five handlers share one rollout knob, one ordering guarantee (the SerialIO single-thread executor), and one observable contract in tests. The handlers + why they were ANR-ing BackgroundManager.onFocus / onUnfocused Synchronous JobScheduler.cancel / .schedule on the main thread. Binder transactions to system_server that can block for many seconds on Xiaomi / MIUI under power-save. OTel insertId ycae33cjpu6gcyut shows a 20,796 ms main-thread block on a 25078RA3EL / Android 15 device. NotificationsManager.onFocus refreshNotificationState() drives NotificationRestoreWorkManager .beginEnqueueingWork, which lazily constructs WorkManager (opens / migrates the SQLite store at app_data/databases/androidx.work.workdb on first call) and then writes a WorkSpec row. OTel insertId 9qy5s0ta0cwqwmb0 shows a 30,516 ms main-thread block on a vivo I2306 / Android 15 device. Short-circuits on `restored = true` after the first call, so only the first focus event per process eats the SQLite stall. NotificationPermissionController polling lifecycle listener onFocus reads ConfigModel.foregroundFetchNotificationPermissionInterval and calls pollingWaiter.wake(), which dispatches a coroutine resume onto the IO pool via channel.trySend -> ThreadPoolExecutor.execute. On cold start that hits the OneSignalDispatchers lazy chain (executor + dispatcher + scope construction) on the calling thread - 26 / 500 main-thread ANRs in logs/2026-05-12 sit on this stack. onUnfocused does the symmetric job of pushing the polling interval to 1 day to effectively pause polling. FeatureFlagsRefreshService.onFocus / onUnfocused onFocus -> restartForegroundPolling -> OneSignalDispatchers.launchOnIO, same lazy chain stall - 18 / 500 ANRs in the same bucket. onUnfocused cancels the poll job; we route the cancellation through the same serial dispatcher so back-to-back focus -> unfocus stays globally ordered with onFocus's polling-job swap, and `synchronized(this)` is qualified as `synchronized(this@FeatureFlagsRefreshService)` so the lambda locks on the service instance (the same monitor restartForegroundPolling takes) rather than the no-receiver lambda object. SessionService.onFocus / onUnfocused sessionLifeCycleNotifier.fire { onSessionStarted / Active } invokes the registered session-lifecycle handlers (operation repo, IAM trigger eval, etc.) synchronously, and the first one to touch OneSignalDispatchers pays the cold-init cost on the main thread - 25 / 500 ANRs in logs/2026-05-12 sit on this stack. session.startTime / session.focusTime / activeDuration accounting is preserved by capturing _time.currentTimeMillis on the caller's thread BEFORE the wrapper and passing it into the deferred handleOnFocus / handleOnUnfocused, so the timestamps reflect when Android delivered the event, not when the serial dispatcher ran the block. Rollout matrix (uniform across all five handlers) FF on -> runOnSerialIOIfBackgroundThreading { ... } dispatches to OneSignalDispatchers.SerialIO (single-thread executor). Main thread returns from handleFocus immediately. FF off -> the block runs inline on the lifecycle main thread. Legacy behavior; retains the ANR for the control cohort so the A/B comparison stays clean. Activation is APP_STARTUP per FeatureFlag.kt, so a given session is latched on one path and won't bounce mid-run. Worth flagging that the production ANR samples for every handler in this PR were on FF=ON - because all five previously bypassed every threading helper, the FF did not gate any of these codepaths. This PR is what introduces the gate. Why the serial dispatcher specifically All five handlers are invoked from the same main-thread fanout (ApplicationService.handleFocus -> applicationLifecycleNotifier.fire). A rapid focus burst on a multi-thread IO pool could interleave them with each other and with the BackgroundManager cancel/schedule pair. Pinning all five to the same single-thread executor keeps lifecycle work globally ordered on the main-thread submission order, and future per-event work added to any of these handlers (focus counters, notification analytics, session timing) inherits the ordering guarantee for free. Tests (all new specs pass; existing specs unchanged) * BackgroundManagerTests: existing tests + FF-on (dispatches through launchOnSerialIO in order) + FF-off (runs inline, does not dispatch) for both cancel and schedule. Includes a rapid unfocus -> focus burst test that pins both events through the serial dispatcher in submission order. * NotificationsManagerTests: dispatch contract on onFocus + rapid focus burst preserves submission order. Lambda body is observable (the test stub invokes the captured block) so JaCoCo sees the refreshNotificationState() call covered. * NotificationPermissionControllerTests: dispatch contract for the polling lifecycle listener on both onFocus and onUnfocused. Existing polling integration tests still pass under the FF-off default. * FeatureFlagsRefreshServiceTests: onFocus + onUnfocused route through runOnSerialIOIfBackgroundThreading. * SessionServiceTests: existing state-mutation assertions still pass under the FF-off default (the wrapper runs inline). New assertions for the dispatch contract on onFocus + onUnfocused + the rapid burst. :OneSignal:core + :OneSignal:notifications detekt + full unit suites green. Co-authored-by: Cursor <cursoragent@cursor.com>
…old-start ANRs ANR-dump analysis (logs/2026-05-12, 500 entries on sdk_background_threading) shows 23 / 500 (4.6%) of ANRs ending in SyncJobService.onStartJob -> suspendifyOnIO, all bottoming out in the same OneSignalDispatchers lazy chain: ThreadPoolExecutor.execute -> LinkedBlockingQueue.offer CoroutineDispatcher.dispatch -> kotlinx.coroutines first-launch OneSignalDispatchers.IOScope.<init> (by lazy) OneSignalDispatchers.IO (by lazy) OneSignalDispatchers.ioExecutor (by lazy) The first IO consumer in the process pays the executor + dispatcher + scope construction + the kotlinx.coroutines MainDispatcherFactory ServiceLoader scan on its thread. Under sdk_background_threading whichever main-thread caller wins the race eats 5-20s before the watchdog fires. #2644 routes the five known onFocus / onUnfocused handlers through runOnSerialIOIfBackgroundThreading so they no longer fire on main, but the deeper structural problem is the lazy chain itself - a future call site that slips past the FF gate (or a JobService delivered to main before init has run) hits the same stall. OneSignalDispatchers.prewarm() spawns a dedicated short-lived "OneSignal-prewarm" daemon thread that submits one empty launch on each of IO / Default / SerialIO. That single thread pays the lazy-init cost end-to-end so the next production caller - even on the main thread - only sees the cheap "submit work to an already-constructed executor" cost. * Idempotent: double-checked-locked prewarmStarted flag, so repeat calls from init / suspend init / SyncJobService.onStartJob no-op cheaply. An internal resetPrewarmForTest() lets specs exercise the "first call wins" branch independently. * Fire-and-forget: failures log and swallow. The existing Dispatchers.IO / SerialIO fallback paths in [IO] / [SerialIO] still apply if anything goes wrong, so a failed prewarm just means the first real caller pays the original cost. * Daemon thread at NORM_PRIORITY - 2 so prewarm never blocks process exit or starves UI work. Called from: * OneSignalImp.initWithContext(context, appId) (sync variant) * OneSignalImp.initWithContextSuspend(context, appId) (suspend variant, used by re-entrant suspend callers) * SyncJobService.onStartJob BEFORE suspendifyOnIO (JobService can fire before the host app init runs) Tests (:core OneSignalDispatchersTests) * prewarm returns immediately on the caller and the daemon thread brings IO / Default / SerialIO + their scopes to Active. * prewarm is idempotent - second call does not spawn another OneSignal-prewarm thread (verified via thread-name scan). Stacked on #2644. Together with #2643 and #2644 this covers the full 95 / 500 main-thread-ANR bucket from logs/2026-05-12 attributable to SDK threading helpers (47 onFocus + 23 JobService + 25 SessionService). :OneSignal:core detekt + full unit suite green. Co-authored-by: Cursor <cursoragent@cursor.com>
be6f168 to
abe5633
Compare
71d40b4 to
2c9ac93
Compare
Contributor
Author
|
@copilot resolve the merge conflicts in this pull request |
Co-authored-by: abdulraqeeb33 <2866528+abdulraqeeb33@users.noreply.github.com>
Contributor
Resolved and pushed in 8372bf9. I merged latest Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
fadi-george
approved these changes
May 12, 2026
nan-li
approved these changes
May 12, 2026
This was referenced May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Move the
OneSignalDispatcherslazy-chain construction cost off the main thread on cold start. Closes the structural root cause that#2644covers tactically with FF-gated wrappers — a future call site that slips past the FF gate (or aJobServicedelivered to main before init has run) would otherwise still pay the construction cost on the calling thread.Linear: SDK-4512
Base branch:
ar/sdk-4506-notifications-onfocus-anr(#2644)Note: the branch name
ar/sdk-4507is numerically sequential with the rest of this series but is unrelated to LinearSDK-4507(which is an unrelated Web ticket). The tracking ticket for this PR isSDK-4512.Motivation
ANR-dump analysis of
logs/2026-05-12 (500 ANR entries, all on sdk_background_threading)shows 23 / 500 (4.6 %) of ANRs ending inSyncJobService.onStartJob -> suspendifyOnIO, all bottoming out in the sameOneSignalDispatcherslazy chain:The first IO consumer in the process pays the executor + dispatcher + scope construction + the kotlinx.coroutines
MainDispatcherFactoryServiceLoaderscan on its thread. Undersdk_background_threadingwhichever main-thread caller wins the race eats 5–20 s before the watchdog fires.#2644 routes the five known onFocus / onUnfocused handlers through
runOnSerialIOIfBackgroundThreadingso they no longer fire on main, but the deeper structural problem is the lazy chain itself — a future call site that slips past the FF gate (or aJobServicedelivered to main before init has run) hits the same stall.Fix:
OneSignalDispatchers.prewarm()OneSignal-prewarmdaemon thread that submits one empty launch on each ofIO/Default/SerialIO.MainDispatcherFactoryServiceLoaderscan → worker thread spin-up) so the next production caller — even on the main thread — only sees the cheap submit-to-already-constructed-executor path.prewarmStartedflag — repeat calls from init / suspend init /SyncJobService.onStartJobno-op cheaply. An internalresetPrewarmForTest()lets specs exercise the "first call wins" branch independently.NORM_PRIORITY - 2so prewarm never blocks process exit or starves UI work.Dispatchers.IO/SerialIOfallback paths still apply if anything goes wrong, so a failed prewarm just means the first real caller pays the original cost.Called from:
OneSignalImp.initWithContext(context, appId)(sync variant)OneSignalImp.initWithContextSuspend(context, appId)(suspend variant, used by re-entrant suspend callers)SyncJobService.onStartJobbeforesuspendifyOnIO, because theJobServicecan fire before the host app'sinitWithContextruns and would otherwise be the first IO consumer.Base branch
Stacked on
ar/sdk-4506-notifications-onfocus-anr(#2644). Together with #2643 and #2644 this covers the full ~94 / 500 main-thread-ANR bucket fromlogs/2026-05-12attributable to SDK threading helpers (47 onFocus + 23 JobService + 25 SessionService).Testing
Static
:OneSignal:core:detekt— clean.Automated
:OneSignal:core:testReleaseUnitTestfull suite — green, including the newOneSignalDispatchersTestsprewarm cases:prewarmreturns immediately on the caller and the daemon thread bringsIO/Default/SerialIOand their scopes toActive.prewarmis idempotent — second call does not spawn anotherOneSignal-prewarmthread (verified via thread-name scan).Manual
Cold-start ANR rate on the
sdk_background_threading=ONcohort should drop ~9 % from the prewarm fix alone (the 47 onFocus entries also benefit indirectly because the FF-on path no longer races for first-IO-consumer status).