Fm/debug examples by filip-michalsky · Pull Request #916 · PrimeIntellect-ai/verifiers

filip-michalsky · 2026-02-15T17:51:56Z

Description

This PR improves the BrowserEnv CUA path and the bundled Browserbase CUA server by making session creation, retries, and sandbox cleanup more resilient, and by surfacing structured
retryable/validation errors back to the caller.

It also adds viewport-aware default CUA prompting and tool descriptions, fixes CUA env_response handling for empty zero-arg tool calls and screenshot-bearing tool messages, and
updates the browser examples/docs to reflect the new behavior and tuning knobs.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Ran uv run pytest tests/test_browser_env.py locally (37 passed in 0.13s).

New coverage was added for:

viewport-aware default CUA prompts and tool descriptions
CUA env_response screenshot relocation and empty tool-arg normalization
session-create retry classification and structured error formatting
sandbox setup failure cleanup behavior

I also started uv run pytest, but it did not complete during this verification window, so the full-suite checkbox is left unchecked.

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Additional Notes

Key additions include dedicated CUA session-creation retry settings, per-environment CUA request concurrency limiting, bounded server-side session-create queueing, and structured
validation errors that give the model repairable feedback for bad action arguments.

Note

Medium Risk
Touches CUA control flow (session creation, retries, error handling, and sandbox cleanup) across both the server and Python client, which can affect rollout stability and retry behavior if misclassified. Changes are well-scoped and backed by new tests, but concurrency/timeout tuning may need validation under real load.

Overview
CUA server now returns structured, retry-aware errors and validates action arguments. executeAction throws ActionValidationError with per-field details (e.g., integer pixel coords, non-negative wait), and server.ts maps validation/rate-limit/timeout failures to 400/429/504 with retryable flags and state included on action failures.

Session creation is throttled and classified. sessionManager adds bounded concurrent session creation with a queue (CUA_SESSION_CREATE_MAX_CONCURRENT/CUA_SESSION_CREATE_MAX_PENDING) and surfaces typed SessionCreateError codes/statuses.

BrowserEnv CUA path is more resilient and controllable. CUAMode adds separate retry policies for session creation vs requests, parses structured error payloads to decide retryability, caps per-env concurrent CUA requests (cua_max_concurrent_requests), and cleans up partially-created sandboxes on setup failure; tool descriptions and default CUA prompts are updated to include viewport dimensions and coordinate guidance, and env_response normalizes empty zero-arg tool calls and relocates screenshot parts out of tool messages.

Docs/examples are updated accordingly, and new Node + pytest tests cover structured errors, retry classification, prompt/tool descriptions, env_response behavior, and sandbox cleanup.

^{Written by Cursor Bugbot for commit 3f861d5. This will update automatically on new commits. Configure here.}

…example

verifiers/envs/integrations/browser_env/modes/cua_mode.py

verifiers/envs/integrations/browser_env/browser_env.py

pyproject.toml

verifiers/envs/integrations/browser_env/modes/cua_mode.py

pyproject.toml

assets/templates/browserbase/cua/actionExecutor.ts

merge commit

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-09T12:47:31Z

assets/templates/browserbase/cua/actionExecutor.ts

+          x,
+          y,
+          scroll_x,
+          scroll_y,


Drag action skips new structured validation error pattern

Medium Severity

The drag action (and default unknown-action case) still uses the old pattern of returning { success: false, error: "..." } instead of throwing ActionValidationError like all other refactored actions (click, type, scroll, goto, wait, keypress). This means drag validation errors are returned with HTTP 200 and no structured code/retryable/details fields, bypassing the new buildErrorResponse error classification in the server's catch block. The Python client won't format these errors with the new repairable feedback, and the model won't receive the same structured validation details it gets for other actions.

Additional Locations (1)

assets/templates/browserbase/cua/server.ts#L218-L223

cursor · 2026-03-09T12:47:31Z

environments/browser_cua_example/browser_cua_example.py

+CUA_SYSTEM_PROMPT_TEMPLATE = """You are a browser automation agent. You can control a web browser using the provided tools.
+
+The display resolution is {viewport_width}x{viewport_height} pixels.
+Use integer pixel coordinates measured from the top-left corner of the page.


Example prompt duplicates and diverges from default CUA prompt

Low Severity

CUA_SYSTEM_PROMPT_TEMPLATE in the example and _build_default_cua_system_prompt in browser_env.py both build viewport-aware CUA system prompts with overlapping but divergent content. The example's version hardcodes a tool list while the default version omits it; both inject {viewport_width}x{viewport_height}. Updating one without the other will silently introduce inconsistencies for users who switch between the example and direct BrowserEnv usage.

Additional Locations (1)

verifiers/envs/integrations/browser_env/browser_env.py#L17-L27

cdreetz · 2026-03-13T20:48:42Z

environments/browser_cua_example/browser_cua_example.py

-CUA_SYSTEM_PROMPT = """You are a browser automation agent. You can control a web browser using the provided tools.
+CUA_SYSTEM_PROMPT_TEMPLATE = """You are a browser automation agent. You can control a web browser using the provided tools.
+
+The display resolution is {viewport_width}x{viewport_height} pixels.


this stuff should be in the tool definition not system prompt

cdreetz · 2026-03-13T20:52:14Z

verifiers/envs/integrations/browser_env/modes/cua_mode.py

        self.backoff_factor = backoff_factor
        self.max_backoff_seconds = max_backoff_seconds
        self.jitter = jitter
+        self.session_create_max_retries = (


can probably just set all these things to some value that has a default instead of all the if else (same for the stuff below)

cdreetz · 2026-03-13T20:58:12Z

verifiers/envs/integrations/browser_env/browser_env.py

 ModeType = Literal["dom", "cua"]


+def _build_default_cua_system_prompt(viewport_width: int, viewport_height: int) -> str:


should leave system prompt stuff to the downstream environment not in the base browserenv

cdreetz · 2026-03-13T20:58:34Z

verifiers/envs/integrations/browser_env/browser_env.py

+                return dumped
+        return None
+
+    async def env_response(


what is all this new env resposne stuff for

filip-michalsky added 2 commits February 15, 2026 11:18

fall back to bb project id in env - devX improvement for running DOM …

a714cff

…example

ruff

7d80b80

cursor bot reviewed Feb 15, 2026

View reviewed changes

verifiers/envs/integrations/browser_env/modes/cua_mode.py Show resolved Hide resolved

verifiers/envs/integrations/browser_env/browser_env.py Outdated Show resolved Hide resolved

filip-michalsky added 6 commits February 15, 2026 19:01

simplify fix

6da43f3

fix ty

4b33baf

ruff

fa40f0b

reduce OOM issues

395c79b

fix memory leak

29f99e5

ty fix

a62cc90

cursor bot reviewed Feb 20, 2026

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

verifiers/envs/integrations/browser_env/modes/cua_mode.py Show resolved Hide resolved

cdreetz approved these changes Mar 6, 2026

View reviewed changes

merge tip of latest main

e75658d

cursor bot reviewed Mar 8, 2026

View reviewed changes

verifiers/envs/integrations/browser_env/modes/cua_mode.py Show resolved Hide resolved

enforce integer, dims handling

6a2812d

cursor bot reviewed Mar 8, 2026

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

filip-michalsky added 2 commits March 8, 2026 20:41

make bugbot happier

2e725c0

add window dims to cua example

744c084

cursor bot reviewed Mar 9, 2026

View reviewed changes

assets/templates/browserbase/cua/actionExecutor.ts Show resolved Hide resolved

filip-michalsky added 2 commits March 9, 2026 09:30

Merge branch 'main' into fm/debug-examples

efe0a98

merge commit

make bugbot happy

3f861d5

cursor bot reviewed Mar 9, 2026

View reviewed changes

cdreetz reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fm/debug examples#916

Fm/debug examples#916
filip-michalsky wants to merge 14 commits intoPrimeIntellect-ai:mainfrom
filip-michalsky:fm/debug-examples

filip-michalsky commented Feb 15, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Mar 9, 2026

Uh oh!

cursor bot Mar 9, 2026

Uh oh!

cdreetz Mar 13, 2026

Uh oh!

cdreetz Mar 13, 2026

Uh oh!

cdreetz Mar 13, 2026

Uh oh!

cdreetz Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		ModeType = Literal["dom", "cua"]


		def _build_default_cua_system_prompt(viewport_width: int, viewport_height: int) -> str:

Conversation

filip-michalsky commented Feb 15, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Additional Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Mar 9, 2026

Choose a reason for hiding this comment

Drag action skips new structured validation error pattern

Uh oh!

cursor bot Mar 9, 2026

Choose a reason for hiding this comment

Example prompt duplicates and diverges from default CUA prompt

Uh oh!

cdreetz Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

cdreetz Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

cdreetz Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

cdreetz Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

filip-michalsky commented Feb 15, 2026 •

edited by cursor bot

Loading