Release v0.2.0 release · alibaba/ROLL

Hello everyone! Thank you for your attention to ROLL.
ROLL has recently updated with a large number of new features. Below is a summary of recent updates, and we will continue to iterate and update ROLL. Welcome to join the ROLL community.

🚀 Highlights:

New model support: Qwen3-VL, Qwen3-MoE-VL, Qwen3-Omni, GLM-4.7
Agentic training and Rollout GPU partial overlap, switching idle training GPUs to Rollout
DynamicSamplingScheduler coroutine refactoring
New: FSDP2 Strategy
Training supports Sequence packing and Dynamic batching

🚀 Major New Features:

Rollout
- DynamicSamplingScheduler coroutine refactoring
- Custom rollout pre/post process, supporting dynamic sampling params, multi-stage generation, ThinkingBudget control
- Sglang: Strategy refactoring, supporting server mode, native onload/offload, inflight FP8 quant rollout, cross-machine multi-node deployment
- vLLM: DP/EP support, supports vllm==0.12.0
- Provides AgentNative Rollout paradigm, AgentNativeStepEnvManager + SokobanNativeEnv, fully managed context by env
- Async Rollout Hang Detect: Added asynchronous Rollout hang detection to quickly locate problematic envs
- Supports rollout dump & mock, improving forward/train phase precision alignment efficiency
- Agentic pipeline supports train-val/rollout overlap
Training
- FSDP2
- Megatron support LoRA, LoRA RL blogs: https://macaron.im/mindlab/research/building-trillion-parameter-reasoning-rl-with-10-gpus
- Save model parameters in HF format online during Megatron training
- Support FP8 training for Megatron Strategy
- Sequence packing, fine-tuned loss_func interface definition
- Dynamic batching
- Add DeepSpeed SFT support
Model Update implementation optimization: Eliminate inter-machine redundancy, weight conversion and nccl broadcast overlap, optimize host to device, adjust multiple pp serial synchronization to lock mode for simultaneous synchronization
Asynchronous Feature
- Training and Rollout GPU partial overlap, switching idle training GPUs to Rollout, report: https://arxiv.org/abs/2512.24873
- Agentic off policy loss with IS correction
Pipeline recipe
- VLM image tool use: DeepEyes, tool invocation and reward calculation overlap
Models: New model support for Qwen3-VL, Qwen3-MoE-VL, Qwen3-Omni-Thinker, GLM-4.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0 release

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!