v0.2.0 release
Hello everyone! Thank you for your attention to ROLL.
ROLL has recently updated with a large number of new features. Below is a summary of recent updates, and we will continue to iterate and update ROLL. Welcome to join the ROLL community.
🚀 Highlights:
- New model support: Qwen3-VL, Qwen3-MoE-VL, Qwen3-Omni, GLM-4.7
- Agentic training and Rollout GPU partial overlap, switching idle training GPUs to Rollout
- DynamicSamplingScheduler coroutine refactoring
- New: FSDP2 Strategy
- Training supports Sequence packing and Dynamic batching
🚀 Major New Features:
- Rollout
- DynamicSamplingScheduler coroutine refactoring
- Custom rollout pre/post process, supporting dynamic sampling params, multi-stage generation, ThinkingBudget control
- Sglang: Strategy refactoring, supporting server mode, native onload/offload, inflight FP8 quant rollout, cross-machine multi-node deployment
- vLLM: DP/EP support, supports vllm==0.12.0
- Provides AgentNative Rollout paradigm, AgentNativeStepEnvManager + SokobanNativeEnv, fully managed context by env
- Async Rollout Hang Detect: Added asynchronous Rollout hang detection to quickly locate problematic envs
- Supports rollout dump & mock, improving forward/train phase precision alignment efficiency
- Agentic pipeline supports train-val/rollout overlap
- Training
- FSDP2
- Megatron support LoRA, LoRA RL blogs: https://macaron.im/mindlab/research/building-trillion-parameter-reasoning-rl-with-10-gpus
- Save model parameters in HF format online during Megatron training
- Support FP8 training for Megatron Strategy
- Sequence packing, fine-tuned loss_func interface definition
- Dynamic batching
- Add DeepSpeed SFT support
- Model Update implementation optimization: Eliminate inter-machine redundancy, weight conversion and nccl broadcast overlap, optimize host to device, adjust multiple pp serial synchronization to lock mode for simultaneous synchronization
- Asynchronous Feature
- Training and Rollout GPU partial overlap, switching idle training GPUs to Rollout, report: https://arxiv.org/abs/2512.24873
- Agentic off policy loss with IS correction
- Pipeline recipe
- VLM image tool use: DeepEyes, tool invocation and reward calculation overlap
- Models: New model support for Qwen3-VL, Qwen3-MoE-VL, Qwen3-Omni-Thinker, GLM-4.7