Skip to content

refactor(cloud-agent-next): make wrapper the single control plane#1118

Merged
eshurakov merged 3 commits intomainfrom
eshurakov/cloud-agent-random-port
Mar 17, 2026
Merged

refactor(cloud-agent-next): make wrapper the single control plane#1118
eshurakov merged 3 commits intomainfrom
eshurakov/cloud-agent-random-port

Conversation

@eshurakov
Copy link
Contributor

@eshurakov eshurakov commented Mar 16, 2026

Summary

Refactor cloud-agent-next so the wrapper process becomes the single control plane inside the sandbox container. Instead of the DO/Worker managing two separate processes (kilo server + wrapper), the wrapper starts and owns the kilo server in-process via @kilocode/sdk's createKilo(). This eliminates split-lifetime bugs, simplifies lifecycle management, and reduces the DO's knowledge of container internals.

Architectural changes:

  • Wrapper as control plane: The wrapper starts the kilo server in-process via createKilo(), replacing the DO-managed server-manager. The DO now manages only one process (the wrapper).
  • Simplified health monitoring: Replaced complex health checks with a 4-layer model — wrapper transport reconnect (15s), DO stale heartbeat (90s), DO hung session (~5 min), and DO max runtime (30 min).
  • Wrapper startup identity: Session identity (--agent-session, --user-id, --session-id) is now passed via CLI args at wrapper startup. Per-execution config (autoCommit, model, etc.) is passed via POST /job/prompt body.
  • Wrapper version pinning: Worker and wrapper now agree on a version string (WRAPPER_VERSION); mismatched wrappers are restarted.
  • Random port for kilo server: Kilo server now binds to a random port (10000–60000) inside the container instead of a fixed port.

Deleted modules (net −1159 lines): server-manager.ts, kilo-types.ts, sse-consumer.ts, kilo-client.ts — replaced by kilo-api.ts, ports.ts, wrapper-version.ts.

Frontend fix: Fixed a streaming regression where message text disappeared after completion. Completed messages now stay buffered for late metadata/part updates.

Verification

  • pnpm typecheck — passes (tsgo + wrapper tsc)
  • Wrapper unit tests — 640+ tests pass across 26 test files
  • Wrapper build — succeeds
  • Manual check

Visual Changes

N/A

Reviewer Notes

  • The createKilo() SDK call in wrapper/src/main.ts is the central architectural change — start review there.
  • kilo-api.ts is the new adapter between wrapper modules and the @kilocode/sdk client; methods the SDK doesn't expose typed methods for (commit-message, permission reply) use raw fetch against the in-process server URL.
  • maxRuntimeMs is never sent by the frontend today — always defaults to 30 min.
  • Kilo server sends server.heartbeat every 10s on SSE; the wrapper now forwards it to the DO. DO ingest heartbeat debounce is 30s; stale threshold is 90s (~3 missed heartbeats).

@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Mar 16, 2026

Code Review Summary

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File Line Issue
cloud-agent-next/src/persistence/CloudAgentSession.ts 1249 Idle wrapper cleanup deletes the wrapper-backed Kilo session, so resumes after the idle timeout fail with configured session ... not found
Other Observations (not in diff)

N/A

Files Reviewed (18 files)
  • cloud-agent-next/src/execution/orchestrator.ts - 0 issues
  • cloud-agent-next/src/kilo/wrapper-client.ts - 0 new issues
  • cloud-agent-next/src/kilo/wrapper-manager.ts - 0 issues
  • cloud-agent-next/src/persistence/CloudAgentSession.ts - 1 issue
  • cloud-agent-next/src/router/handlers/session-prepare.ts - 0 issues
  • cloud-agent-next/src/session/queries/executions.ts - 0 issues
  • cloud-agent-next/src/session/types.ts - 0 issues
  • cloud-agent-next/src/websocket/ingest.ts - 0 new issues
  • cloud-agent-next/wrapper/package.json - 0 issues
  • cloud-agent-next/wrapper/src/connection.ts - 0 new issues
  • cloud-agent-next/wrapper/src/kilo-api.ts - 0 new issues
  • cloud-agent-next/wrapper/src/lifecycle.ts - 0 new issues
  • cloud-agent-next/wrapper/src/main.ts - 0 new issues
  • cloud-agent-next/wrapper/src/server.ts - 0 new issues
  • cloud-agent-next/wrapper/src/state.ts - 0 issues
  • src/components/cloud-agent-next/store/atoms.ts - 0 issues
  • src/lib/cloud-agent-next/processor/event-processor.ts - 0 issues
  • cloud-agent-next/wrangler.jsonc - 0 issues

Reviewed by gpt-5.4-20260305 · 3,053,710 tokens

- Replace rawRequest with v2 SDK and add event subscription generation counter
- Forward variant to SDK, fail on missing session
- Remove unsafe casts, narrow deps, and add crash handlers
- Suppress prefer-const lint for late-bound lifecycleManager
- Remove unnecessary type assertion in client test
- Address PR review warnings
… subscription leaks

Cancel old SDK event subscriptions immediately by propagating the
AbortController's signal to event.subscribe(). Previously, aborting
only set a boolean flag that was checked on the next iteration of the
for-await loop, leaving the underlying HTTP stream alive until the
next SSE event arrived. Now close() and reconnectEventSubscription()
tear down the stream transport instantly.
@eshurakov eshurakov force-pushed the eshurakov/cloud-agent-random-port branch from 9713dc1 to 1bc69bc Compare March 16, 2026 21:29
Call callbacks.onSseEvent() when server.connected arrives, matching the pattern used by server.heartbeat and regular kilocode events. This starts the 15-second dead-subscription recovery timer immediately upon SSE subscription instead of waiting 10-30s for the first heartbeat.
@eshurakov eshurakov merged commit 1bdefe3 into main Mar 17, 2026
18 checks passed
@eshurakov eshurakov deleted the eshurakov/cloud-agent-random-port branch March 17, 2026 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants