Fix proxy leaks and stale paths in produce()#44
Conversation
Reading a nested container from a draft returns a proxy wrapper. When a recipe wrote such a value back into the draft (setitem, append, insert, extend, update, setdefault, slice assignment), the proxy object itself was stored in the result and deepcopied into the patch values, making results unserializable and patches inconsistent with the result. - Unwrap values at every write entry point: proxies (also when nested inside plain dicts/lists/tuples) are replaced by plain deep copies of their data. Assigning a proxy thus has value semantics, which is the only semantics JSON patches can express. - Replace copy.deepcopy in PatchRecorder with a hand-rolled _snapshot that copies JSON-like types directly (and unwraps proxies), falling back to deepcopy for unknown types such as observ proxies. - Add __deepcopy__ to all proxy classes so any deepcopy of a proxy yields plain data. - setdefault now returns through __getitem__ so mutations on the returned container are tracked (previously they went unrecorded). Three known limitations remain as strict xfails, to be addressed by parent-linked live paths (Tier 2): mutating a detached item records against its stale path, and held proxies record stale indices after list shifts. Benchmarks are unchanged within noise. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
pop() tested the truthiness of the default instead of whether one was provided, so pop(key, None), pop(key, 0) or pop(key, "") raised KeyError on a missing key. Use a sentinel to match dict.pop semantics. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Proxies previously stored an absolute path at creation time, which went stale as the tree changed: a held proxy recorded against its old index after list shifts (silently corrupting patches), and proxies for removed values kept recording against paths that no longer existed. Proxies now store (parent, key) and compute their path on demand by walking up to the root. The child-proxy cache doubles as a registry of handed-out proxies, so structural changes can keep them correct: - insert/delete/pop/slice ops on lists reindex cached child proxies instead of discarding them; sort and reverse remap them by element identity. - Removing or replacing a value detaches its proxy (transitively, via the parent chain). Detached proxies still mutate their data but stop recording: the mutations are captured by the snapshot of whatever write re-inserts the data later. - Assigning a detached proxy directly back into the draft re-attaches it at the new location (move semantics), so held references stay live across detach/reattach regrouping. A proxy still attached elsewhere is copied, and a cycle guard prevents adopting an ancestor. - list.insert now clamps out-of-range and negative indices like list.insert does, so recorded paths match the actual insertion point. This turns the three xfailed scene-tree tests green and adds a suite covering re-attachment, detachment and index-shift scenarios. Benchmark medians are at parity with the baseline: location tokens are built in a single pass with a fast path for root-level proxies, and patch paths are constructed with a single Pointer allocation, matching the allocation count of the previous absolute-path scheme. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Cover the remaining produce.py branches with behavioral tests: tuple and frozenset values in patch snapshots, proxies nested inside tuples, deepcopy of draft proxies inside a recipe, and extended-slice (step) assignment/deletion with held child proxies. The proxy branch of _snapshot is unreachable through public flows (write paths pre-unwrap values) and is covered directly as defense in depth. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Three improvements, all verified against the pre-branch benchmark baseline: - Build reverse patches with append + a single reverse() in finalize() instead of insert(0, ...) per patch, which made recording O(n^2) in the number of mutations (73x slower at 20k patches). - Create copy-mode drafts with _snapshot instead of copy.deepcopy (~2x faster for JSON-like data; unknown types still fall back to deepcopy). Unlike deepcopy, _snapshot does not preserve shared references within the base; aliased subtrees become independent copies, which is what patches can express anyway. - Skip the _snapshot call for scalar values in the recorder, saving a function call per recorded patch. All produce benchmarks now run at or faster than the master baseline by median; the largest wins are copy-mode recipes (nested structure -38%, dict mutations -15%, sparse-on-large-object -10%). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Two follow-up commits:
🤖 Generated with Claude Code |
Parent links introduced reference cycles (child -> parent -> child cache), turning every proxy into cyclic garbage that only the gc could reclaim. The resulting collector pauses showed up as large latency spikes: p99 for a small nested produce() was 190us with gc enabled vs 6us with gc disabled, inflating benchmark means (with huge stddev) while medians were unaffected. The recorder now registers every proxy it creates and finalize() breaks the cycles, so proxies are freed by reference counting again (p99 back to ~4us, zero cyclic objects left behind). Released proxies are also marked detached, so a proxy leaked out of a recipe can no longer record into the already-returned patch lists. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
The benchmark CI failure was real, not a fluke — but subtle: the parent links from the live-paths commit created reference cycles ( Fixed in 990db03: the recorder registers every proxy it creates, and 🤖 Generated with Claude Code |
The CI benchmark gate kept flagging test_produce_observ_nested_in_place (+5-9% mean), a benchmark dominated by fixed per-produce() bookkeeping: proxy creation, the proxy registry and the cycle-breaking walk in finalize(). Measured locally against master with interleaved runs, the overhead was a real +3.4%. - Parent links are now weak references, so proxies never form reference cycles in the first place: no registry, no cycle-breaking walk, and proxies are freed by reference counting. A dead parent reference means a detached ancestor was collected, which is treated as detached - the semantics the parent chain already implied. - finalize() now only reverses the reverse patches and detaches the root proxy; leaked proxies stay inert because every attached proxy computes its path through the (now detached) root. - The child-proxy cache dict is allocated lazily on first wrap; most proxies are leaves that never hand out children. This halves the regression to +1.9% on that benchmark while all other produce benchmarks remain at or faster than the master baseline. Also gate the CI benchmark comparison on median instead of mean: means on shared runners are easily skewed by scheduler/gc latency spikes (test_list_diff_identical showed +16% mean on code this branch does not touch). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Follow-up on the two benchmark CI failures (b85b8bf):
🤖 Generated with Claude Code |
Changing the CI benchmark gate is out of scope for this PR. The check is optional and used to understand what is happening; the remaining ~2% mean regression on test_produce_observ_nested_in_place is the known fixed per-produce() bookkeeping cost of live paths and safe teardown. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Seems to be pretty ok: some improvements in performance and only a few minor 'regressions' but nothing too significant. But above all it should work now as intended which is more important 😅 |
Problem
Reading a nested container from a
produce()draft returns a proxy (DictProxy/ListProxy/SetProxy). Writing such a value back into the draft leaked the proxy object itself into both the result and the recorded patch values:It was extra sneaky because
reprpasses through, so the leaked proxy printed like a plain dict. Two deeper problems surfaced while investigating, both hit by scene-tree regrouping (move an item, nest it in a new group, keep editing it):IndexError.insert/del/pop/slices), silently corrupting patches: replaying them edited the wrong element.Fix
Proxies never leak into results or patches (f0e604b)
PatchRecordersnapshots with a hand-rolled_snapshot(faster thancopy.deepcopyfor JSON-like data, falls back todeepcopyfor unknown types such as observ proxies).__deepcopy__returning plain data (mirroring observ's fix in Fix deepcopy of Proxy objects creating unregistered zombie proxies observ#165).setdefaultnow returns through__getitem__, so mutations on the returned container are tracked (previously they went unrecorded).DictProxy.popfalsy defaults (1f80f71)pop(key, None)/pop(key, 0)raisedKeyErrorbecause the code tested the truthiness of the default instead of its presence. Now uses a sentinel, matchingdict.pop.Live paths through parent links (de991c6)
(parent, key)and compute their path on demand by walking up to the root; the child-proxy cache doubles as a registry of handed-out proxies.sort/reverseremap them by element identity.list.insertnow clamps out-of-range/negative indices likelist.insert, so recorded paths match the actual insertion point.Semantics
Assigning a proxy that is still attached elsewhere has value semantics (a copy is stored) — the only semantics JSON Patch can express, and it matches the scene-tree invariant that a node has exactly one parent. Assigning a detached proxy is a move. The invariant throughout:
apply(base, patches) == resultandapply(result, reverse) == base.Tests & performance
test_produce_proxy_leak.py(every write entry point, serialization, aliasing round-trips) andtest_produce_live_paths.py(re-attachment, detachment, index shifts, sort/reverse, full regroup scenario). 335 tests pass.master: location tokens are built in a single pass with a fast path for root-level proxies, patch paths use a singlePointerallocation, and scalar writes skip the unwrap/adopt machinery via an inline type check.🤖 Generated with Claude Code