Skip to content

feat: add TRL GRPOTrainer rollout_func for WAA environments#127

Merged
abrichr merged 2 commits intomainfrom
feat/trl-rollout-func
Mar 18, 2026
Merged

feat: add TRL GRPOTrainer rollout_func for WAA environments#127
abrichr merged 2 commits intomainfrom
feat/trl-rollout-func

Conversation

@abrichr
Copy link
Member

@abrichr abrichr commented Mar 18, 2026

Summary

  • make_waa_rollout_func() wraps WAADesktopEnv into TRL's experimental rollout_func API
  • Handles VLM multimodal generation (screenshot → action tokens → logprobs)
  • Dense rewards via milestones automatically used when TaskConfig has milestones
  • parse_action_json() handles common VLM quirks (thinking tokens, markdown fences, unknown types)
  • Integrates with TaskConfig YAML tasks and evaluate_dense()

Usage

from trl import GRPOConfig, GRPOTrainer
from openadapt_evals.training.trl_rollout import make_waa_rollout_func

rollout_func = make_waa_rollout_func(
    adapter=WAALiveAdapter(config),
    task_configs=TaskConfig.from_dir("./tasks/"),
)

trainer = GRPOTrainer(model=model, rollout_func=rollout_func, ...)
trainer.train()

Test plan

  • 15 tests passing (10 parser + 5 integration with mock adapter)
  • GPU integration test with real VLM model (requires g5.xlarge)

🤖 Generated with Claude Code

abrichr and others added 2 commits March 18, 2026 02:43
make_waa_rollout_func() wraps WAADesktopEnv into TRL's experimental
rollout_func API. Handles VLM multimodal generation (screenshot →
action tokens), dense rewards via milestones, and action JSON
parsing with thinking-token tolerance.

Includes parse_action_json() that handles common VLM quirks
(markdown fences, thinking prefixes, unknown action types).

15 tests passing (10 parser + 5 integration with mock adapter).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@abrichr abrichr merged commit 578985a into main Mar 18, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant