Reject stateful programs whose output aliases a state-write source#2713
Open
john-rocky wants to merge 1 commit into
Open
Reject stateful programs whose output aliases a state-write source#2713john-rocky wants to merge 1 commit into
john-rocky wants to merge 1 commit into
Conversation
When a function output Var feeds the value side of a coreml_update_state op, the Core ML runtime proxy crashes with a hard segmentation fault on load — no Python traceback, just a process exit. The pattern is natural to write when porting torch decoder transformers (return the merged tensor that was just stored in the KV cache), so the silent crash is a footgun. Add a backend-level validation that walks the source-Var graph backwards from each coreml_update_state value and raises a clear ValueError if it finds a model output along the way. The error names the offending output and the affected state, and points at the workaround. Existing programs that already follow the recommended pattern (return a non-buffer-shaped derived tensor) keep converting unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Core ML runtime proxy crashes with a segmentation fault (no Python traceback) when loading an
mlprogramwhose function outputVaris the sameVarthat feeds acoreml_update_stateop. It is easy to hit when porting a torch decoder: write to a KV cache withself.cache[:] = mergedand thenreturn merged.This rejects the pattern at conversion time instead, with a
ValueErrorthat names the offending output and the affected state and points at a workaround:_validate_no_state_write_aliased_with_outputruns inbackend/mil/load.pyand walks back (bounded depth) from eachcoreml_update_statevalue to check whether any ancestorVaris also a model output. It only fires on the exact aliasing pattern, so models whose returned tensor does not feed the state-write chain are unaffected.Test plan
New
coremltools/test/ml_program/test_stateful_output_alias_guard.py:test_aliasing_pattern_raises_clear_error— the aliasing forward raises theValueError(asserts the message names the output, the state, and the workaround).test_non_aliasing_pattern_converts— the reduced-output variant (return merged.sum(...)) still converts (guards against false positives).Verified locally on macOS (the raise path runs entirely in the converter, before runtime load):
The non-aliasing case exercises full
mlprogramserialization and runtime load, so it runs in CI.