feat(handler): on-the-fly package install for deployed mode#77
Conversation
When a deployed handler fails to import due to a missing package (e.g. excluded from the build artifact), the worker now attempts to install it via DependencyInstaller and retries the handler load. This prevents fatal crashes for CPU endpoints that need packages not in the slim base image. Logs a warning on each on-the-fly install so users are aware of the cold start penalty and can add the package to their dependencies list. Capped at 3 recovery attempts with a guard against retrying the same package, preventing unbounded install loops.
There was a problem hiding this comment.
Pull request overview
Adds a recovery path for Flash-deployed workers to automatically install missing Python packages during generated-handler import failures, then retry loading (bounded by a max-attempts cap), with tests covering success/failure/guard behavior.
Changes:
- Introduces
_exec_handler_modulerecovery loop that installs missing packages and retries handler module execution. - Adds
MAX_IMPORT_RECOVERY_ATTEMPTSconstant to cap recovery behavior. - Expands unit tests to cover successful recovery, install failure, and retry-guard behavior.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/handler.py |
Adds missing-package detection, on-the-fly install, retry logic, and a dedicated recovery error type. |
src/constants.py |
Adds MAX_IMPORT_RECOVERY_ATTEMPTS configuration for the recovery loop. |
tests/unit/test_handler.py |
Adds/updates tests validating recovery success, install failure behavior, and retry guardrails. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- Restrict recovery to ModuleNotFoundError instead of broad ImportError - Fix off-by-one: add extra exec attempt after final install (MAX+1 iterations) - Add importlib.invalidate_caches() after on-the-fly install - Improve error message for failed recovery (actionable, not speculative) - Add explicit spec/loader assertions in test for clearer failure diagnostics - Update tests to raise ModuleNotFoundError to match new except clause
runpod-Henrik
left a comment
There was a problem hiding this comment.
What changed: When a deployed handler fails at startup because a package is missing from the build artifact, the worker now attempts to install it on-the-fly and retries. Capped at 3 installs. Recovery failures surface as descriptive errors recommending flash deploy.
What was tested:
- Successful on-the-fly install → handler loads on retry
- Failed install → descriptive error raised, worker does not hang
- Same package failing repeatedly after install → raises immediately, no unbounded retries
- Non-deployed mode → unaffected, recovery code not reached
What works: All three recovery paths behave as described.
Verdict: Pass. The recovery path fires only when a package is missing from the build artifact — after the companion fix to the build exclusion list, this occurs only for explicitly excluded packages or build pipeline failures. Both branches (install succeeds, install fails) are covered; failure surfaces as a clear, actionable error.
Summary
DependencyInstallerand retries the handler loadMAX_IMPORT_RECOVERY_ATTEMPTS(3) with a guard against retrying the same package_HandlerRecoveryErrorto distinguish recovery failures from user-code RuntimeErrorsWhy: CPU endpoints using
python-slimmay be missing packages that were excluded from the build artifact. This provides a safety net that prevents fatal crashes while logging actionable warnings.Companion PRs:
Test plan
make quality-checkpassestest_recovery_installs_missing_package_and_retries-- happy pathtest_raises_on_import_error_when_install_fails-- install failuretest_recovery_stops_if_same_package_fails_twice-- retry guard