Skip to content

Fm/debug examples#916

Open
filip-michalsky wants to merge 14 commits intoPrimeIntellect-ai:mainfrom
filip-michalsky:fm/debug-examples
Open

Fm/debug examples#916
filip-michalsky wants to merge 14 commits intoPrimeIntellect-ai:mainfrom
filip-michalsky:fm/debug-examples

Conversation

@filip-michalsky
Copy link
Contributor

@filip-michalsky filip-michalsky commented Feb 15, 2026

Description

This PR improves the BrowserEnv CUA path and the bundled Browserbase CUA server by making session creation, retries, and sandbox cleanup more resilient, and by surfacing structured
retryable/validation errors back to the caller.

It also adds viewport-aware default CUA prompting and tool descriptions, fixes CUA env_response handling for empty zero-arg tool calls and screenshot-bearing tool messages, and
updates the browser examples/docs to reflect the new behavior and tuning knobs.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Ran uv run pytest tests/test_browser_env.py locally (37 passed in 0.13s).

New coverage was added for:

  • viewport-aware default CUA prompts and tool descriptions
  • CUA env_response screenshot relocation and empty tool-arg normalization
  • session-create retry classification and structured error formatting
  • sandbox setup failure cleanup behavior

I also started uv run pytest, but it did not complete during this verification window, so the full-suite checkbox is left unchecked.

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

Key additions include dedicated CUA session-creation retry settings, per-environment CUA request concurrency limiting, bounded server-side session-create queueing, and structured
validation errors that give the model repairable feedback for bad action arguments.


Note

Medium Risk
Touches CUA control flow (session creation, retries, error handling, and sandbox cleanup) across both the server and Python client, which can affect rollout stability and retry behavior if misclassified. Changes are well-scoped and backed by new tests, but concurrency/timeout tuning may need validation under real load.

Overview
CUA server now returns structured, retry-aware errors and validates action arguments. executeAction throws ActionValidationError with per-field details (e.g., integer pixel coords, non-negative wait), and server.ts maps validation/rate-limit/timeout failures to 400/429/504 with retryable flags and state included on action failures.

Session creation is throttled and classified. sessionManager adds bounded concurrent session creation with a queue (CUA_SESSION_CREATE_MAX_CONCURRENT/CUA_SESSION_CREATE_MAX_PENDING) and surfaces typed SessionCreateError codes/statuses.

BrowserEnv CUA path is more resilient and controllable. CUAMode adds separate retry policies for session creation vs requests, parses structured error payloads to decide retryability, caps per-env concurrent CUA requests (cua_max_concurrent_requests), and cleans up partially-created sandboxes on setup failure; tool descriptions and default CUA prompts are updated to include viewport dimensions and coordinate guidance, and env_response normalizes empty zero-arg tool calls and relocates screenshot parts out of tool messages.

Docs/examples are updated accordingly, and new Node + pytest tests cover structured errors, retry classification, prompt/tool descriptions, env_response behavior, and sandbox cleanup.

Written by Cursor Bugbot for commit 3f861d5. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

x,
y,
scroll_x,
scroll_y,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drag action skips new structured validation error pattern

Medium Severity

The drag action (and default unknown-action case) still uses the old pattern of returning { success: false, error: "..." } instead of throwing ActionValidationError like all other refactored actions (click, type, scroll, goto, wait, keypress). This means drag validation errors are returned with HTTP 200 and no structured code/retryable/details fields, bypassing the new buildErrorResponse error classification in the server's catch block. The Python client won't format these errors with the new repairable feedback, and the model won't receive the same structured validation details it gets for other actions.

Additional Locations (1)

Fix in Cursor Fix in Web

CUA_SYSTEM_PROMPT_TEMPLATE = """You are a browser automation agent. You can control a web browser using the provided tools.

The display resolution is {viewport_width}x{viewport_height} pixels.
Use integer pixel coordinates measured from the top-left corner of the page.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example prompt duplicates and diverges from default CUA prompt

Low Severity

CUA_SYSTEM_PROMPT_TEMPLATE in the example and _build_default_cua_system_prompt in browser_env.py both build viewport-aware CUA system prompts with overlapping but divergent content. The example's version hardcodes a tool list while the default version omits it; both inject {viewport_width}x{viewport_height}. Updating one without the other will silently introduce inconsistencies for users who switch between the example and direct BrowserEnv usage.

Additional Locations (1)

Fix in Cursor Fix in Web

CUA_SYSTEM_PROMPT = """You are a browser automation agent. You can control a web browser using the provided tools.
CUA_SYSTEM_PROMPT_TEMPLATE = """You are a browser automation agent. You can control a web browser using the provided tools.

The display resolution is {viewport_width}x{viewport_height} pixels.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this stuff should be in the tool definition not system prompt

self.backoff_factor = backoff_factor
self.max_backoff_seconds = max_backoff_seconds
self.jitter = jitter
self.session_create_max_retries = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can probably just set all these things to some value that has a default instead of all the if else (same for the stuff below)

ModeType = Literal["dom", "cua"]


def _build_default_cua_system_prompt(viewport_width: int, viewport_height: int) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should leave system prompt stuff to the downstream environment not in the base browserenv

return dumped
return None

async def env_response(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is all this new env resposne stuff for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants