Skip to content

Lazy watermark-based View cache on ConversationState#3335

Open
csmith49 wants to merge 10 commits into
mainfrom
lazy-view-watermark
Open

Lazy watermark-based View cache on ConversationState#3335
csmith49 wants to merge 10 commits into
mainfrom
lazy-view-watermark

Conversation

@csmith49
Copy link
Copy Markdown
Collaborator

@csmith49 csmith49 commented May 20, 2026

Refs: #3053
Replaces: #3324
Follow-up: #3328

Summary

Adds a lazily-updated, incrementally-maintained View cache to ConversationState, laying the groundwork for #3053. This is a drop-in replacement for #3324 that eliminates the synchronization complexity that was causing repeated review cycles.

The key difference: pull instead of push

PR #3324 used a push-based callback on EventLog._on_append to eagerly update the cached View on every write. That design coupled EventLog to View lifecycle management and created lock-ordering, callback-timing, and reentrancy problems that surfaced across 6+ review rounds (19 threads, 4 still unresolved).

This PR uses a pull-based watermark: the View is lazily brought up to date when state.view is read, by replaying only the events appended since the last read.

@property
def view(self) -> View:
    n = len(self._events)
    if n > self._view_watermark:
        for i in range(self._view_watermark, n):
            self._view.append_event(self._events[i])
        self._view_watermark = n
    return self._view

This eliminates every synchronization concern from #3324:

Concern from #3324 Why it disappears here
Callback fires inside file lock → deadlock risk No callbacks
synced_count diverges from _sync_from_disk Watermark just checks len(events)
Direct EventLog.append() bypasses cache Watermark detects any new events automatically
FIFOLock reentrancy in condense() / arun() No write-path coupling
_wire_view_sync() ordering No callbacks to wire
close() needs state-lock wrapping No write-path view mutation
condense() event emission moved inside lock No write-path view mutation

Comparison

Metric #3324 (push) This PR (pull)
Files changed 7 5
Lines added 557 123
EventLog changes _on_append, set_on_append, synced_count None
_default_callback changed Yes No
close() / condense() changed Yes (lock wrapping) No
Unresolved review threads 4 N/A
Hot-path cost O(1) per event O(k) per step (k ≈ 2–4 new events)
enforce_properties on hot path Never Never

Changes

openhands-sdk/openhands/sdk/conversation/state.py

  • _view: View and _view_watermark: int — derived, not serialized.
  • view read-only property — lazily catches up via watermark.
  • rebuild_view() — full View.from_events re-derivation for cold load, fork, error recovery.
  • create() resume path calls rebuild_view() after attaching the EventLog.

openhands-sdk/openhands/sdk/agent/agent.py

  • step() and astep() malformed-history handlers call state.rebuild_view() before emitting CondensationRequest.

openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py

  • fork() calls fork_conv._state.rebuild_view() after deep-copying events.

openhands-sdk/openhands/sdk/context/condenser/base.py

  • Docstring: view parameter is now documented as read-only.

Files NOT changed (unlike #3324)

  • event_store.py — no _on_append callback, no set_on_append().
  • _default_callback — stays as self._state.events.append(e).
  • close() — no lock-wrapping changes.
  • condense() — no lock-wrapping changes.

Tests

  • tests/sdk/conversation/test_state_view_cache.py (12 tests): empty view, append tracking, parity with View.from_events, condensation, hot-path enforce_properties avoidance, rebuild, idempotent reads, persistence resume.
  • tests/sdk/agent/test_agent_context_window_condensation.py (4 new tests): rebuild_view called on malformed-history recovery in both step() and astep().
  • All 1,254 tests pass across tests/sdk/conversation/, tests/sdk/agent/, tests/sdk/context/.

Backward compatibility

  • All public APIs preserved. New additive surface only: ConversationState.view, ConversationState.rebuild_view.
  • No behavior change: Agent.step still calls View.from_events(state.events) via prepare_llm_messages. The cached view is maintained but unused on the hot path — the follow-up PR (Consume cached View in prepare_llm_messages #3328) swaps the call site.

Follow-up

#3328 swaps Agent.step to read from state.view instead of rebuilding from scratch, delivering the actual perf win. That PR will need to be rebased onto this branch once this merges.


This PR was created by an AI agent (OpenHands) on behalf of @csmith49.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:a6ea6c6-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-a6ea6c6-python \
  ghcr.io/openhands/agent-server:a6ea6c6-python

All tags pushed for this build

ghcr.io/openhands/agent-server:a6ea6c6-golang-amd64
ghcr.io/openhands/agent-server:a6ea6c65e552ac866562a587527f00fbe3714b5b-golang-amd64
ghcr.io/openhands/agent-server:lazy-view-watermark-golang-amd64
ghcr.io/openhands/agent-server:a6ea6c6-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:a6ea6c6-golang-arm64
ghcr.io/openhands/agent-server:a6ea6c65e552ac866562a587527f00fbe3714b5b-golang-arm64
ghcr.io/openhands/agent-server:lazy-view-watermark-golang-arm64
ghcr.io/openhands/agent-server:a6ea6c6-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:a6ea6c6-java-amd64
ghcr.io/openhands/agent-server:a6ea6c65e552ac866562a587527f00fbe3714b5b-java-amd64
ghcr.io/openhands/agent-server:lazy-view-watermark-java-amd64
ghcr.io/openhands/agent-server:a6ea6c6-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:a6ea6c6-java-arm64
ghcr.io/openhands/agent-server:a6ea6c65e552ac866562a587527f00fbe3714b5b-java-arm64
ghcr.io/openhands/agent-server:lazy-view-watermark-java-arm64
ghcr.io/openhands/agent-server:a6ea6c6-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:a6ea6c6-python-amd64
ghcr.io/openhands/agent-server:a6ea6c65e552ac866562a587527f00fbe3714b5b-python-amd64
ghcr.io/openhands/agent-server:lazy-view-watermark-python-amd64
ghcr.io/openhands/agent-server:a6ea6c6-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:a6ea6c6-python-arm64
ghcr.io/openhands/agent-server:a6ea6c65e552ac866562a587527f00fbe3714b5b-python-arm64
ghcr.io/openhands/agent-server:lazy-view-watermark-python-arm64
ghcr.io/openhands/agent-server:a6ea6c6-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:a6ea6c6-golang
ghcr.io/openhands/agent-server:a6ea6c65e552ac866562a587527f00fbe3714b5b-golang
ghcr.io/openhands/agent-server:lazy-view-watermark-golang
ghcr.io/openhands/agent-server:a6ea6c6-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:a6ea6c6-java
ghcr.io/openhands/agent-server:a6ea6c65e552ac866562a587527f00fbe3714b5b-java
ghcr.io/openhands/agent-server:lazy-view-watermark-java
ghcr.io/openhands/agent-server:a6ea6c6-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:a6ea6c6-python
ghcr.io/openhands/agent-server:a6ea6c65e552ac866562a587527f00fbe3714b5b-python
ghcr.io/openhands/agent-server:lazy-view-watermark-python
ghcr.io/openhands/agent-server:a6ea6c6-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., a6ea6c6-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., a6ea6c6-python-amd64) are also available if needed

Refs: #3053

Alternative to the push-based callback approach in #3324.  Instead of
eagerly syncing the cached View via EventLog._on_append callbacks, the
View is lazily brought up to date on read using an integer watermark
that tracks how many events have been incorporated.

Changes:
- state.py: _view, _view_watermark, view property, rebuild_view()
- agent.py: rebuild_view() in step/astep malformed-history handlers
- local_conversation.py: rebuild_view() after fork event copy
- condenser/base.py: docstring read-only warning

Files NOT changed (unlike #3324):
- event_store.py: no _on_append callback, no set_on_append()
- local_conversation.py: _default_callback unchanged, close() unchanged,
  condense() unchanged — no lock-wrapping changes needed

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions github-actions Bot added the release-note-required PR requires explicit release-note coverage for behavioral or default changes label May 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   agent.py3653989%101, 215, 340, 344, 612–613, 620–621, 707, 711–712, 717–719, 721, 731–732, 749–750, 757–758, 781, 788, 792, 796, 799–800, 802–803, 817–818, 1105–1106, 1108, 1136, 1144–1145, 1179, 1186
openhands-sdk/openhands/sdk/context/condenser
   base.py692268%77, 191–192, 210–213, 215–216, 218–219, 221–223, 226–229, 231, 233, 242, 253
openhands-sdk/openhands/sdk/conversation
   state.py221597%434, 482–484, 625
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py5314491%308, 313, 460, 506, 543, 559, 624, 844–845, 848, 961, 972–975, 982–983, 986, 992–993, 996, 1002, 1017, 1020, 1024–1025, 1029–1031, 1038, 1124, 1129, 1239, 1241, 1245–1246, 1257–1258, 1283, 1478, 1482, 1552, 1559–1560
TOTAL27749821870% 

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the new SDK ConversationState.view cache behavior through real SDK usage; it works for incremental updates, condensation, persistence resume, fork, and malformed-history recovery.

Does this PR achieve its stated goal?

Yes. The PR set out to add a lazily-updated, watermark-maintained View cache on ConversationState plus rebuild hooks for cold load/fork/recovery; exercising the SDK directly showed the cache starts empty, catches up after real event appends, matches View.from_events, applies condensation, survives resume/fork, and is rebuilt before malformed-history condensation retry. No functional issues were found.

Phase Result
Environment Setup uv run python created/used the project environment and imported openhands.sdk successfully.
CI Status 🟡 Most checks green, including SDK/tests/pre-commit/API checks; image publish, PR review, and this QA workflow were still in progress when checked.
Functional Verification ✅ Real SDK scripts exercised the changed behavior successfully; no test suite, linter, or formatter was run.
Functional Verification

Test 1: ConversationState.view cache, condensation, resume, and rebuild

Step 1 — Establish baseline on origin/main:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_view_cache.py after checking out origin/main:

created_state=c88009b0-32a8-4123-8511-75e751db1308
AttributeError: 'ConversationState' object has no attribute 'view'
BASE_EXIT=1

This confirms the base branch does not expose the new cached view surface.

Step 2 — Apply the PR's changes:
Checked out lazy-view-watermark at 91d20cba325c9ed9af689e2af3a1e520cd45a883.

Step 3 — Re-run with the PR in place:
Ran the same SDK workflow:

initial_view_events=0
after_appends_ids=0c61...,f17b...,13fa...
view_matches_full_rebuild=True
after_condensation_ids=f17b...,13fa...
first_removed=True
condensation_request_flag=True
resumed_event_count=2
rebuild_replaced_view=True
RESULT=PASS

This shows the lazy cache is initialized, catches up after appends, stays equivalent to a full rebuild, applies condensation, restores on resume, and can be explicitly rebuilt.

Test 2: Conversation fork gets a rebuilt cached view

Step 1 — Baseline:
On origin/main, the cached state.view surface is absent as shown in Test 1, so there is no forked cached view to inspect.

Step 2 — Apply the PR's changes:
Used the PR branch and created a real local Conversation, appended a message event, then called convo.fork().

Step 3 — Verify fork behavior:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_fork_view.py:

source_view_events=1
fork_type=LocalConversation
fork_view_events=1
fork_contains_source_event=True
RESULT=PASS

This confirms a user-level forked conversation exposes a cached view containing the copied source event.

Test 3: Malformed-history recovery rebuilds a stale cached view

Step 1 — Baseline:
The base branch has no cached state.view; this recovery hook is specific to the PR's new cache.

Step 2 — Apply the PR's changes:
On the PR branch, I created a real Conversation with a custom LLM that raises LLMMalformedConversationHistoryError, deliberately corrupted the cached view, and invoked agent.step(...).

Step 3 — Verify recovery behavior:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_malformed_recovery.py:

LLM raised malformed conversation history error, triggering condensation retry with condensed history: qa malformed history
corrupted_cached_view_events=0
view_after_recovery_events=2
view_contains_original_event=True
condensation_request_emitted=True
RESULT=PASS

This shows Agent.step recovered the stale cache before emitting the CondensationRequest path, matching the PR's stated error-recovery goal.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

@csmith49 csmith49 requested a review from all-hands-bot May 20, 2026 23:20
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the SDK-facing lazy ConversationState.view cache behavior end-to-end via real SDK calls; the PR achieves its stated goal with no QA issues found.

Does this PR achieve its stated goal?

Yes. The PR sets out to add a lazily-updated, watermark-based View cache on ConversationState, plus rebuild behavior for cold load, fork, and malformed-history recovery. Running the SDK on the PR commit showed the new API exists and works for incremental reads, condensation projection, full rebuild parity, persistence resume, conversation fork, and malformed-history recovery (rebuild_calls: 1 before CondensationRequest).

Phase Result
Environment Setup make build completed successfully and installed the uv workspace dependencies.
CI Status 🟡 31 checks successful, 2 skipped, 2 automation checks still in progress at review time (pr-review, qa-changes).
Functional Verification ✅ Real SDK operations passed on PR commit; baseline confirmed the new cache API was absent on origin/main.
Functional Verification

Test 1: Establish baseline without the PR

Step 1 — Reproduce / establish baseline (without the fix):
Ran:

git checkout --quiet origin/main
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_view_cache.py > /tmp/qa_base_final.raw 2>&1
grep '^{"event"' /tmp/qa_base_final.raw

Output:

{"event": "sdk_import_ok"}
{"event": "api_surface", "has_view": false, "has_rebuild_view": false}
{"event": "baseline_stop", "reason": "ConversationState view cache API is unavailable"}

This confirms the baseline SDK does not expose the new ConversationState.view / rebuild_view() surface, so there is no lazy view cache to exercise before the PR.

Step 2 — Apply the PR's changes:
Checked out the PR commit:

git checkout --quiet cb9d2fd0143300e4b3b07691b26694b34a08aa76

Step 3 — Re-run with the fix in place:
Ran:

OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_view_cache.py > /tmp/qa_pr_final.raw 2>&1
grep '^{"event"' /tmp/qa_pr_final.raw

Output:

{"event": "sdk_import_ok"}
{"event": "api_surface", "has_view": true, "has_rebuild_view": true}
{"event": "first_read", "view_len": 1, "matches_first": true}
{"event": "incremental_read", "same_view_instance": true, "view_len": 2}
{"event": "condensation_projection", "view_len": 2, "dropped_first": true, "kept_second": true, "kept_third": true}
{"event": "rebuild", "replaced_instance": true, "matches_from_events": true, "watermark_len": 4}
{"event": "resume", "view_len": 1, "same_event_id": true}
{"event": "fork", "source_view_len": 2, "fork_view_len": 2, "same_count": true}
{"event": "malformed_history_recovery", "rebuild_calls": 1, "emitted_condensation_request": true}

This exercised the changed behavior as SDK users would: creating ConversationState, appending real events, reading state.view, applying Condensation, calling rebuild_view(), resuming persisted state, forking a Conversation, and driving Agent.step() through malformed-history recovery. The output shows the lazy cache catches up incrementally, condensation is reflected in the cached projection, rebuild matches View.from_events, cold resume and fork populate the view, and malformed-history recovery rebuilds before emitting a condensation request.

Issues Found

None.

This QA review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid engineering. The pull-based watermark is the right call — it eliminates every synchronization concern from #3324 by simply not touching the view on the write path. The implementation is minimal (123 lines vs 557), the tests are comprehensive, and the additive-only surface means no behavior change until #3328 lands.

Two issues flagged below: one about thread-safety guarantees that should be documented before the hot path is wired up in the follow-up, and one about the stale-reference semantics of rebuild_view().


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation.

Comment thread openhands-sdk/openhands/sdk/conversation/state.py
Comment thread openhands-sdk/openhands/sdk/conversation/state.py Outdated
Comment thread tests/sdk/conversation/test_state_view_cache.py
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable, but the cache currently has ownership/concurrency holes that can corrupt state.

[RISK ASSESSMENT] 🟡 MEDIUM — this changes SDK conversation state / condenser cache semantics and could affect LLM history. Since there is no eval evidence in the PR, I’m leaving a COMMENT for human maintainer/eval follow-up rather than approval.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26195669127

Comment thread openhands-sdk/openhands/sdk/conversation/state.py Outdated
Comment thread openhands-sdk/openhands/sdk/conversation/state.py Outdated
- Guard view property and rebuild_view() with a threading.Lock so
  concurrent readers cannot double-append into the cached View.
- Document in rebuild_view() that previously returned View references
  are invalidated after the call.
- Extend test_resume_path_rebuilds_view to verify the watermark is
  correctly positioned after resume by appending an event and checking
  it appears incrementally.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions github-actions Bot removed the release-note-required PR requires explicit release-note coverage for behavioral or default changes label May 21, 2026
@csmith49 csmith49 requested a review from all-hands-bot May 21, 2026 14:46
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull-based watermark design is the right call — it eliminates all the synchronization complexity of #3324 by simply not touching the view on the write path. The implementation is minimal and clean (123 lines vs 557), the additive-only surface means no behavior change until #3328 lands, and the previous review's thread-safety concerns have been properly addressed with _view_lock.

Two new observations below: one maintenance-safety flag on the lock type, and a minor test style note.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation.

Comment thread openhands-sdk/openhands/sdk/conversation/state.py Outdated
Comment thread tests/sdk/agent/test_agent_context_window_condensation.py Outdated
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ QA Report: PASS WITH ISSUES

Functional SDK verification passed for the lazy ConversationState.view cache, including append catch-up, condensation, rebuild, resume, fork, and malformed-history recovery; one CI job is currently failing.

Does this PR achieve its stated goal?

Yes. The PR set out to add a lazily-updated, watermark-based View cache on ConversationState as additive groundwork without changing normal conversation behavior. I verified the base branch lacks state.view/rebuild_view, then exercised the PR branch through real SDK scripts: state.view tracks appended events, applies condensation, survives persistence resume, is rebuilt on demand, is rebuilt for forks, and malformed-history recovery restores a corrupted cached view before emitting CondensationRequest.

Phase Result
Environment Setup uv sync --frozen --dev completed successfully
CI Status ⚠️ Most checks pass, but Consolidate Build Information is failing; qa-changes/pr-review are still pending
Functional Verification ✅ Real SDK usage matched the PR's stated behavior
Functional Verification

Test 1 — Base branch establishes the old behavior

Step 1 — Reproduce / establish baseline without the fix:
Ran git checkout main && uv run python - <<'PY' ... with a script that created a ConversationState, appended two MessageEvents, and attempted to use the new cache API:

baseline_manual_view_events= 2
baseline_has_state_view= False
baseline_has_rebuild_view= False
baseline_state_view_access= AttributeError

This shows the old SDK required rebuilding a View from events manually and did not expose the new cached state.view or rebuild_view() APIs.

Step 2 — Apply the PR's changes:
Checked out lazy-view-watermark at 01dd45a60839695b2ef1fc6d2a24ce25b503d6e4.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python - <<'PY' ... with a script that created ConversationState instances and used the new cache API with real SDK event appends:

cache.empty_view_events 0
cache.after_two_events 2
cache.repeated_read_same_instance True
cache.after_third_events 3
cache.condensation_removed_first True
cache.condensation_kept_others True
cache.rebuild_replaced_view True
cache.rebuild_matches_full_view True
cache.resumed_view_events 2
cache.resumed_after_append_events 3

This confirms the new cached view starts empty, lazily catches up after appends, repeated reads are idempotent, condensation affects the cached view, rebuild_view() re-derives the cache to match View.from_events, and cold resume initializes the cache so later appends continue working.

Test 2 — Conversation fork rebuilds the copied view

Ran the PR branch SDK script using Conversation(..., visualizer=None), send_message(...), and fork():

fork.source_view_events 2
fork.forked_view_events 2
fork.fork_after_append_events 3
fork.source_after_fork_append_events 2

This shows a forked conversation gets a rebuilt cached view matching the source history, and subsequent fork-only events update only the fork's cache without mutating the source conversation's cache.

Test 3 — Malformed-history recovery rebuilds before condensation retry

Ran the PR branch SDK script with a real Agent.step(...) path using an LLM subclass that raises LLMMalformedConversationHistoryError and a condenser that handles condensation requests. Before stepping, I intentionally removed one event from the cached view to simulate the cache corruption the recovery path is intended to repair:

malformed.corrupted_view_events_before_step 1
malformed.view_events_when_request_emitted 2
malformed.original_view_count 2
malformed.removed_event_restored True
malformed.emitted_condensation_request True

This confirms Agent.step() rebuilt the cached view back to the full event count before emitting CondensationRequest, and the malformed-history recovery path continued instead of raising.

Issues Found

  • 🟡 CI: Consolidate Build Information is currently failing on the PR. I did not find functional SDK issues in manual QA, but this CI failure should be resolved or confirmed unrelated before merge.

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable — the pull-based cache is simpler than the previous push approach, but one cache/watermark race still needs tightening.

[RISK ASSESSMENT] 🟡 MEDIUM — this touches SDK conversation history/condenser semantics and could affect agent/eval behavior; I’m leaving a COMMENT for maintainer/eval follow-up rather than approval.

VERDICT: Worth merging after the bounded rebuild issue is fixed.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26233370659

Comment thread openhands-sdk/openhands/sdk/conversation/state.py
csmith49 and others added 2 commits May 21, 2026 10:23
- Switch _view_lock from threading.Lock to threading.RLock so future
  reentrant paths (e.g. ViewProperty touching state.view) cannot
  deadlock silently.
- Snapshot events in rebuild_view() before building the View so the
  watermark is tied to exactly the events that were materialized,
  closing a theoretical TOCTOU race with concurrent appenders.
- Move 'from unittest.mock import patch' to module-level in
  test_agent_context_window_condensation.py (was duplicated in two
  function bodies with no circular-import justification).

Co-authored-by: openhands <openhands@all-hands.dev>
@csmith49 csmith49 requested a review from all-hands-bot May 21, 2026 20:59
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the SDK behavior introduced by this PR by running standalone ConversationState/LocalConversation exercises; the lazy view cache, resume rebuild, fork rebuild, and malformed-history recovery path behaved as claimed.

Does this PR achieve its stated goal?

Yes. The PR sets out to add a lazily-updated, watermark-based ConversationState.view cache plus rebuild hooks for cold load/fork/error recovery. On origin/main, the same SDK exercises show ConversationState.view/rebuild_view are absent; on lazy-view-watermark at 5a96980498f695448beafb7d6edad54ee2bc4206, appended events incrementally appeared in state.view, condensation state matched View.from_events(...), resume continued incrementally after rebuilding, forked conversations had matching cached views, and malformed-history recovery called rebuild_view() before emitting CondensationRequest.

Phase Result
Environment Setup make build completed successfully; no test suite, linter, formatter, or pre-commit run was performed.
CI Status 🟡 GitHub checks showed 31 successful, 2 skipped, 2 pending (PR Review by OpenHands, QA Changes by OpenHands) at verification time.
Functional Verification ✅ Standalone SDK scripts exercised the new public state cache API and affected conversation paths successfully.
Functional Verification

Test 1: Lazy ConversationState.view cache, condensation, rebuild, and resume

Step 1 — Establish baseline without the PR:
Checked out origin/main and ran uv run python /tmp/qa_view_cache_exercise.py:

BASE_HEAD=8f406a88
state_view_property=missing
result=baseline lacks lazy view cache API
view_exit=2

This confirms the base branch does not expose the new lazy view cache API, so the PR is adding the behavior under review.

Step 2 — Apply the PR's changes:
Checked out lazy-view-watermark at 5a96980498f695448beafb7d6edad54ee2bc4206.

Step 3 — Re-run with the PR in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_view_cache_exercise.py:

initial_view_event_count=3
matches_full_rebuild=True
repeated_reads_same_object=True
condensation_removed_first=True
condensed_view_event_count=2
condensation_request_flag=True
rebuild_replaced_cached_instance=True
rebuild_matches_full_rebuild=True
resume_view_event_count=2
resume_incremental_event_count=3
result=lazy view cache behavior verified
view_exit=0

This shows appended events are reflected lazily, repeated reads reuse the cached instance, condensation semantics match a full View.from_events(...) rebuild, explicit rebuild replaces the cached view with a fully-derived view, and resumed persisted state continues incremental updates correctly.

Test 2: Malformed-history recovery rebuilds the cached view

Step 1 — Establish baseline without the PR:
On origin/main, ran uv run python /tmp/qa_malformed_recovery_exercise.py:

rebuild_view_method=missing
result=baseline cannot rebuild cached view during malformed-history recovery
malformed_exit=2

This confirms the base branch cannot perform the new recovery behavior because the rebuild API does not exist there.

Step 2 — Apply the PR's changes:
Checked out lazy-view-watermark at 5a96980498f695448beafb7d6edad54ee2bc4206.

Step 3 — Re-run with the PR in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_malformed_recovery_exercise.py with an SDK LLM subclass that raises LLMMalformedConversationHistoryError:

rebuild_view_calls=1
emitted_condensation_request=True
result=malformed-history recovery path verified
malformed_exit=0

This shows the real Agent.step(...) recovery path calls ConversationState.rebuild_view() and still emits the expected CondensationRequest.

Test 3: LocalConversation fork rebuild path

Step 1 — Establish baseline / affected path:
The PR description calls out fork creation as a cache rebuild path. I exercised the public LocalConversation.fork() flow with a conversation containing a user message.

Step 2 — Apply the PR's changes:
Ran on lazy-view-watermark at 5a96980498f695448beafb7d6edad54ee2bc4206.

Step 3 — Exercise the fork path:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_fork_view_cache_exercise.py:

source_view_event_count=2
fork_view_event_count=2
fork_view_matches_source=True
result=fork view cache behavior verified

This shows the forked conversation's cached view is populated and matches the source conversation after events are copied.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable — the pull-based cache is much simpler, but there is one cache-consistency bug to fix before this is safe to build on.

[RISK ASSESSMENT] 🟡 MEDIUM — this touches conversation history/view state and condenser recovery semantics, so maintainer/eval judgment is still appropriate before merge.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26252792628

Comment thread openhands-sdk/openhands/sdk/conversation/state.py Outdated
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull-based watermark is the right call — eliminating the push-based callback entirely removes every synchronization concern that plagued #3324. All 8 previous review threads are resolved in the current code, and the implementation is clean: the RLock, the snapshot approach in rebuild_view(), and the watermark-position-after-rebuild test are all solid. Two minor fresh observations below.


This review was generated by an AI agent (OpenHands) on behalf of the user via OpenHands Automation.

Comment thread openhands-sdk/openhands/sdk/conversation/state.py Outdated
Comment thread tests/sdk/agent/test_agent_context_window_condensation.py
…ocstring

- Advance _view_watermark after each successful append_event in the
  view property loop, so a mid-loop failure cannot leave already-
  appended events eligible for replay on the next read.
- Document the implicit exception-safety of rebuild_view(): if
  View.from_events raises, the cache is left unchanged.
- Decline instance-level patch.object suggestion: Pydantic's
  __setattr__ prevents setting arbitrary attributes on model
  instances, so class-level patching is the correct approach here.

Co-authored-by: openhands <openhands@all-hands.dev>
@csmith49 csmith49 requested a review from all-hands-bot May 21, 2026 21:27
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the new ConversationState view cache through real SDK usage: lazy reads, condensation, persistence resume, rebuild recovery, malformed-history recovery, and conversation fork all behaved as described.

Does this PR achieve its stated goal?

Yes. The PR set out to add a lazily-updated, watermark-maintained View cache on ConversationState as an additive groundwork change, with rebuild paths for cold load, fork, and malformed-history recovery. I exercised those SDK paths directly: main has no state.view, while this PR exposes it and keeps it in sync across appended events, condensation, resume, explicit rebuild, malformed-history recovery, and Conversation.fork().

Phase Result
Environment Setup make build completed successfully and installed the uv dev environment.
CI Status ⚠️ Latest check glance: most checks passing; unresolved-review-threads failing, pr-review/qa-changes pending, cleanup/artifact review checks skipped.
Functional Verification ✅ Real SDK scripts returned ok: true for PR behavior; no test suite, linters, or type checks were run locally.
Functional Verification

Test 1: ConversationState lazy view cache, condensation, resume, and rebuild

Step 1 — Establish baseline on main:
Ran git checkout main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_state_view_cache_quiet.py:

{
  "ok": false,
  "checks": {
    "has_view_attr": false
  },
  "error_type": "AttributeError",
  "error": "'ConversationState' object has no attribute 'view'"
}

This shows the cached ConversationState.view API is not present on the base branch, so the PR is adding the intended new surface.

Step 2 — Apply the PR's changes:
Checked out lazy-view-watermark at 26711fff.

Step 3 — Re-run with the PR in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_state_view_cache_quiet.py:

{
  "ok": true,
  "checks": {
    "has_view_attr": true,
    "initial_view_count": 3,
    "repeated_read_same_object": true,
    "incremental_matches_from_events": true,
    "condensation_removed_first": true,
    "condensation_view_count": 2,
    "condensation_request_flag": true,
    "resume_view_count": 2,
    "resume_incremental_count": 3,
    "rebuild_replaces_view": true,
    "rebuild_matches_from_events": true
  }
}

This confirms the new cache lazily catches up from appended events, repeated reads are idempotent, incremental state matches View.from_events, condensation is reflected, persisted resume initializes the cache, post-resume appends are tracked, and rebuild_view() replaces/re-derives the cache.

Test 2: Malformed-history recovery rebuilds the cached view before retry

Step 1 — Baseline context:
The base branch result above shows there is no cached state.view to rebuild. The PR-specific claim is that malformed-history recovery refreshes the new cache before emitting a condensation retry.

Step 2 — Apply the PR's changes:
Used lazy-view-watermark at 26711fff with a real Agent.step() call and a local LLM subclass that raises LLMMalformedConversationHistoryError.

Step 3 — Exercise recovery with the PR in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_malformed_recovery_quiet.py:

{
  "ok": true,
  "checks": {
    "condensation_request_emitted": true,
    "cached_view_rebuilt": true
  },
  "seen_events": [
    "CondensationRequest"
  ]
}

This confirms the malformed-history path emits the expected retry request and invalidates/rebuilds the cached view before continuing.

Test 3: Conversation fork preserves and continues the cached view

Step 1 — Baseline context:
As in Test 1, main has no state.view, so forked cached-view behavior is additive in this PR.

Step 2 — Apply the PR's changes:
Used lazy-view-watermark at 26711fff and exercised Conversation.send_message() followed by Conversation.fork().

Step 3 — Exercise fork with the PR in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_conversation_fork_view_quiet.py:

{
  "ok": true,
  "checks": {
    "source_view_had_message": true,
    "fork_view_rebuilt_from_source": true,
    "fork_view_tracks_new_message": true,
    "fork_view_matches_full_rebuild": true
  }
}

This confirms a forked conversation receives a rebuilt view from copied events and continues tracking new events consistently with a full View.from_events rebuild.

Issues Found

None from functional QA.

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable, but there’s one cache-consistency hole in rebuild watermarking (inline).

[RISK ASSESSMENT] 🟡 MEDIUM — this touches conversation history/condenser state and could affect LLM message construction once state.view is wired into the hot path, so I’m leaving a COMMENT for maintainer/eval follow-up rather than approval.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26254102500

Comment thread openhands-sdk/openhands/sdk/conversation/state.py Outdated
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull-based watermark design is a clear improvement over the push-based approach in #3324 — by not touching the view on the write path, every synchronization hazard disappears. All 10 previous review threads are resolved, including thread safety (RLock), watermark atomicity, snapshot safety in rebuild_view(), and exception transparency. Two small observations below that didn't appear in earlier rounds.

This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation.

Comment thread openhands-sdk/openhands/sdk/conversation/state.py
Comment thread openhands-sdk/openhands/sdk/agent/agent.py
@csmith49 csmith49 requested a review from all-hands-bot May 21, 2026 23:03
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull-based watermark design is the right call, and this implementation is clean and correct. Compared to #3324, the reduction in complexity — zero changes to EventLog, no callback wiring, no lock-ordering concerns — is significant and makes the diff straightforward to reason about.

All 14 threads from previous review rounds are resolved. A fresh pass over the current diff doesn't surface any new blocking issues.

What looks good

  • view property — incrementally catches up inside _view_lock, advancing _view_watermark only after a successful append_event. If the append raises, the watermark correctly stays at the failing index. The RLock choice is appropriate given the resolved concern about potential reentrancy on other call paths.
  • rebuild_view() — takes a snapshot first, so View.from_events runs outside the window where _events could grow. If from_events raises, neither _view nor _view_watermark is mutated — clean exception-safety guarantee.
  • agent.py — both step and astep call rebuild_view() before emitting CondensationRequest, ensuring the condenser sees a fully-enforced view on the retry. The async acompletion/aresponses stubs on MalformedHistoryRaisingLLM correctly close the gap that left astep untested.
  • condenser/base.py docstring — explicitly contracts that the view argument is read-only and explains why (View may be a cached projection). The suggested return forms are correct: returning a new View or a Condensation avoids corrupting the cache.
  • test_resume_path_rebuilds_view — the final block verifies that the watermark is correctly positioned after rebuild_view(), so incremental appends continue to work post-resume. This guards the exact regression the comment calls out.
  • test_view_matches_full_rebuild — parity test between incremental and View.from_events output is the right canary for divergence.

One minor suggestion

The view property docstring documents the read-only contract for callers, and rebuild_view() documents that previously returned references are invalidated — but the two disclosures are directionally asymmetric. A one-liner cross-reference in the view docstring (e.g., "See rebuild_view() for when a returned reference may become stale.") would make the contract discoverable from either entry point without requiring the reader to hunt in both places.

This is non-blocking; the information exists and the tests cover the behavior.


This review was generated by an AI agent (OpenHands) on behalf of @csmith49 via OpenHands Automation.

@csmith49 csmith49 requested a review from all-hands-bot May 22, 2026 15:46
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified the SDK-level cached ConversationState.view works through real SDK usage, including lazy updates, persistence resume, fork, condensation, and malformed-history recovery.

Does this PR achieve its stated goal?

Yes. The PR set out to add a lazily-updated, watermark-based View cache on ConversationState as a drop-in groundwork change. On origin/main, a real SDK script confirmed ConversationState had no view attribute; on this PR, real Conversation/ConversationState usage showed state.view updates after send_message() and direct event appends, survives rebuild_view(), cold-loads correctly on persistence resume, is rebuilt for forks, matches View.from_events() after condensation, and still emits a CondensationRequest on malformed-history recovery.

Phase Result
Environment Setup make build completed successfully with uv 0.11.16
CI Status ⚠️ Observed 30 successful, 1 failing (Review Thread Gate/unresolved-review-threads), 2 pending, 4 skipped; I did not rerun tests
Functional Verification ✅ Exercised the changed SDK behavior with real scripts; no functional issues found
Functional Verification

Test 1: Baseline absence vs PR lazy view cache behavior

Step 1 — Reproduce / establish baseline without the fix:
Ran git checkout --detach origin/main && uv run python - <<'PY' ... PY with a script that created a ConversationState, appended a MessageEvent, and attempted to read state.view.

Observed output excerpt:

baseline_event_log_len 1
baseline_has_view_attr False
baseline_view_access_error AttributeError 'ConversationState' object has no attribute 'view'

This establishes the baseline: SDK users could not read a cached conversation view from ConversationState on main.

Step 2 — Apply the PR's changes:
Checked out lazy-view-watermark at 26711fff417a690c3c73f7936e11e1af232d27b4.

Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python - <<'PY' ... PY with real Conversation usage: create a conversation, call send_message(), read convo.state.view, append directly to convo.state.events, call rebuild_view(), resume the persisted conversation, and fork it.

Observed output excerpt:

pr_send_message_event_log_len 2
pr_view_len_after_send_message 2
pr_repeated_reads_same_instance True
pr_event_log_len_after_direct_append 3
pr_view_len_after_direct_append 3
pr_direct_append_visible_in_view True
pr_rebuild_replaces_view_instance True
pr_rebuild_preserves_visible_count 3
pr_resumed_view_len 3
pr_resumed_view_len_after_new_message 4
pr_fork_view_len 4
pr_fork_matches_source_view_len True

This shows the PR adds the promised cached view surface and keeps it current through user-message events, direct event-log appends, rebuilds, persistence resume, and fork creation.

Test 2: Cached view remains semantically correct through condensation

Step 1 — Baseline:
On main, state.view is unavailable as shown in Test 1.

Step 2 — Apply the PR's changes:
Used the PR branch and created a ConversationState, appended three messages, appended a Condensation, compared the cached view IDs to View.from_events(state.events), then appended a CondensationRequest.

Step 3 — Run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python - <<'PY' ... PY.

Observed output excerpt:

pr_condense_view_len_before 3
pr_condense_view_len_after 2
pr_condense_removed_first True
pr_cached_view_matches_fresh_view True
pr_condensation_request_flag True

This confirms the cached view preserves expected View behavior for condensation and remains equivalent to a fresh full view rebuild for the exercised history.

Test 3: Malformed-history recovery still emits condensation request

Step 1 — Baseline:
The new cache-specific recovery path does not exist on main because ConversationState.view is absent.

Step 2 — Apply the PR's changes:
Used the PR branch with a real custom LLM subclass that raises LLMMalformedConversationHistoryError and a condenser that handles condensation requests.

Step 3 — Run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python - <<'PY' ... PY and called agent.step(convo, on_event=seen.append).

Observed output excerpt:

pr_malformed_before_view_count 2
pr_malformed_completion_calls 1
pr_malformed_emitted_condensation_request True
pr_malformed_view_count_after_step 2

This confirms the malformed-history path remains functional from a user-extensible SDK flow: the LLM error is handled and a CondensationRequest is emitted without crashing.

Issues Found

None.

This review was created by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull-based watermark is clearly the right design, and after the iterative fixes across this review cycle the implementation is clean and correct. All previously raised threads have been resolved: _view_lock (RLock) serializes concurrent view/rebuild_view calls, the watermark advances after each successful append_event (exception-safe), rebuild_view uses a snapshot so the watermark can't diverge from the events that actually fed the rebuilt view, and both step/astep rebuild before emitting CondensationRequest.

One minor test gap flagged inline — non-blocking, but worth closing before the follow-up (#3328) wires state.view into the hot path.

Aside — I/O under _view_lock: The view property holds _view_lock while iterating self._events[i] for each uncached event. If EventLog.__getitem__ involves disk I/O (it takes a file_store/dir_path), blocking I/O occurs under the lock, serialising any concurrent rebuild_view call. For the incremental path (2–4 events per step) this is unlikely to matter; if EventLog memory-caches after load it's a non-issue. Worth a quick confirmation before #3328.


This review was generated by an AI agent (OpenHands) on behalf of @csmith49 through OpenHands Automation.

Comment thread tests/sdk/conversation/test_state_view_cache.py
Covers the gap where test_view_matches_full_rebuild only used
MessageEvents. The new test asserts state.view == View.from_events
on a mixed sequence that includes a Condensation event.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Taste Rating: Acceptable — the pull-based design is clean, but one cold-load watermark consistency bug needs attention.

[RISK ASSESSMENT] 🟡 MEDIUM — this touches conversation history/view cache semantics and the future condenser hot path, so maintainer/eval judgment is still appropriate before merge.

VERDICT: Worth merging after the inline watermark bug is fixed.

This review was generated by an AI agent (OpenHands) on behalf of the user.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26297602298

Comment thread openhands-sdk/openhands/sdk/conversation/state.py Outdated
csmith49 and others added 2 commits May 22, 2026 10:32
The snapshot was added to guard a theoretical TOCTOU race between
from_events iteration and watermark assignment, but rebuild_view
holds the view lock and event appends happen on the same thread.
Passing self._events directly is simpler and avoids the copy.

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants