Bug: Nested EnvGroup task routing is silently broken
When an EnvGroup is wrapped inside another EnvGroup (as prime-rl does when it wraps user envs), the outer group overwrites the task column to a single name. The inner EnvGroup then fails to route because its env_map uses the original task names — and silently falls back to envs[0] for rollouts, or returns reward=0.0 for scoring.
This means only the first sub-environment's rubric ever executes, and the user gets no error or warning that the rest of their tasks are being silently misrouted or scored as zero.
Impact
If a user builds a multi-task environment using EnvGroup (e.g., 17 tasks with different rubrics) and registers it as a single [[orchestrator.env]] in prime-rl, only the first task type gets meaningful reward signal. The model trains with zero reward on all other task types. There is no error or warning — the only symptom is that wandb metrics only show the first rubric's reward name.
Problematic code path
1. Task name overwriting (EnvGroup.__init__, lines 172-182)
for env, name in zip(self.envs, self.env_names):
add_task = make_add_task_fn(name)
env_dataset = env.build_dataset()
if env_dataset is not None:
if "task" in env_dataset.column_names:
env_dataset = env_dataset.remove_columns(["task"]) # destroys inner task names
env_dataset = env_dataset.map(add_task, **map_kwargs) # overwrites with outer name
When the sub-env is itself an EnvGroup, its dataset already has meaningful per-task names. The outer EnvGroup unconditionally destroys these and replaces them all with a single name.
2. Silent fallback in get_env_for_task (line 318-319)
def get_env_for_task(self, task: str) -> vf.Environment:
return self.env_map.get(task, self.envs[0]) # silent fallback!
When the inner EnvGroup receives the outer name (which doesn't exist in its env_map), it silently falls back to envs[0]. All rollouts go to the first sub-environment. No warning is logged.
3. Scoring returns zeros for unknown tasks (EnvGroupRubric.score_group, lines 86-94)
task = states[0].get("task", "default")
env = self.env_map.get(task)
if env is None:
self.logger.warning(f"No environment found for task '{task}'")
for state in states:
state["reward"] = 0.0 # all tasks scored as zero
The inner EnvGroupRubric can't find the outer name in its env_map, so all states get reward=0.0.
Proposed fix
A. Preserve inner task names when sub-env is an EnvGroup
In EnvGroup.__init__, when a sub-env is itself an EnvGroup, preserve its internal task names instead of overwriting them. Register each inner task name in the outer env_map pointing to the sub-env:
for env, name in zip(self.envs, self.env_names):
add_task = make_add_task_fn(name)
env_dataset = env.build_dataset()
if env_dataset is not None:
if isinstance(env, EnvGroup) and "task" in env_dataset.column_names:
# Preserve inner EnvGroup's task names for correct routing
for inner_name in env.env_names:
self.env_map[inner_name] = env
# Don't overwrite task column — inner tasks are already set
else:
if "task" in env_dataset.column_names:
env_dataset = env_dataset.remove_columns(["task"])
env_dataset = env_dataset.map(add_task, **map_kwargs)
datasets.append(env_dataset)
Same pattern for eval_dataset handling.
This way:
- Inner task names are preserved in the dataset
- Outer
env_map maps each inner task name to the inner EnvGroup
- Routing works: outer gets a task name, routes to inner
EnvGroup, inner routes to correct sub-env
EnvGroupRubric also gets the updated env_map, so scoring routes correctly
results_df.task.nunique() > 1, enabling per-task logging in orchestrators like prime-rl
B. Remove silent fallback in get_env_for_task
def get_env_for_task(self, task: str) -> vf.Environment:
env = self.env_map.get(task)
if env is None:
available = list(self.env_map.keys())
raise ValueError(
f"No environment found for task '{task}'. "
f"Available tasks: {available}"
)
return env
The current fallback to envs[0] silently masks routing failures. An explicit error makes misconfigurations immediately visible.
Test changes
1. Update test_get_env_for_task — unknown task should raise, not fallback
def test_get_env_for_task(self, mock_openai_client):
# ... (same setup as current) ...
env_group = EnvGroup(envs=[env1, env2], env_names=["math", "code"])
assert env_group.get_env_for_task("math") == env1
assert env_group.get_env_for_task("code") == env2
# Unknown task should raise, not silently fallback
with pytest.raises(ValueError, match="No environment found for task"):
env_group.get_env_for_task("unknown")
2. Add test_nested_env_group_preserves_inner_tasks
def test_nested_env_group_preserves_inner_tasks(self, mock_openai_client):
"""Test that wrapping an EnvGroup in another EnvGroup preserves inner task names."""
env1 = SingleTurnEnv(
client=mock_openai_client,
model="test-model",
dataset=Dataset.from_dict({"question": ["q1"], "answer": ["a1"]}),
rubric=Rubric(),
)
env2 = SingleTurnEnv(
client=mock_openai_client,
model="test-model",
dataset=Dataset.from_dict({"question": ["q2"], "answer": ["a2"]}),
rubric=Rubric(),
)
inner_group = EnvGroup(envs=[env1, env2], env_names=["math", "code"])
outer_group = EnvGroup(envs=[inner_group], env_names=["my_env"])
# Inner task names should be preserved in the dataset
dataset = outer_group.get_dataset()
tasks = dataset["task"]
assert "math" in tasks
assert "code" in tasks
assert "my_env" not in tasks
# Routing should work through both levels
assert outer_group.get_env_for_task("math") == inner_group
assert outer_group.get_env_for_task("code") == inner_group
3. Add test_nested_env_group_rubric_scoring
@pytest.mark.asyncio
async def test_nested_env_group_rubric_scoring(self, mock_openai_client, make_input):
"""Test that scoring routes correctly through nested EnvGroups."""
def math_reward(completion, **kwargs):
return 0.8
def code_reward(completion, **kwargs):
return 0.6
env1 = SingleTurnEnv(
client=mock_openai_client,
model="test-model",
dataset=Dataset.from_dict({"question": ["q1"], "answer": ["a1"]}),
rubric=Rubric(funcs=[math_reward], weights=[1.0]),
)
env2 = SingleTurnEnv(
client=mock_openai_client,
model="test-model",
dataset=Dataset.from_dict({"question": ["q2"], "answer": ["a2"]}),
rubric=Rubric(funcs=[code_reward], weights=[1.0]),
)
inner_group = EnvGroup(envs=[env1, env2], env_names=["math", "code"])
outer_group = EnvGroup(envs=[inner_group], env_names=["my_env"])
# Score a "code" task — should route through outer -> inner -> env2's rubric
state = State(input=make_input(prompt="Test", answer="ans", task="code"))
state["completion"] = "Test completion"
state["trajectory"] = []
state["timing"] = {"generation_ms": 0.0, "scoring_ms": 0.0, "total_ms": 0.0, "start_time": 0.0}
state["is_completed"] = False
state["stop_condition"] = None
state["oai_tools"] = []
state["reward"] = None
state["metrics"] = None
await outer_group.rubric.score_rollout(state)
assert state["reward"] == 0.6 # code_reward, not math_reward
assert state["metrics"]["code_reward"] == 0.6
assert state["metrics"]["math_reward"] == 0.0
Reproduction
Any environment that returns an EnvGroup from its load_environment() and is registered as a single [[orchestrator.env]] in prime-rl will hit this. The inner task names get overwritten, routing breaks silently, and only the first sub-env's rubric ever scores anything.
Bug: Nested EnvGroup task routing is silently broken
When an
EnvGroupis wrapped inside anotherEnvGroup(as prime-rl does when it wraps user envs), the outer group overwrites thetaskcolumn to a single name. The innerEnvGroupthen fails to route because itsenv_mapuses the original task names — and silently falls back toenvs[0]for rollouts, or returnsreward=0.0for scoring.This means only the first sub-environment's rubric ever executes, and the user gets no error or warning that the rest of their tasks are being silently misrouted or scored as zero.
Impact
If a user builds a multi-task environment using
EnvGroup(e.g., 17 tasks with different rubrics) and registers it as a single[[orchestrator.env]]in prime-rl, only the first task type gets meaningful reward signal. The model trains with zero reward on all other task types. There is no error or warning — the only symptom is that wandb metrics only show the first rubric's reward name.Problematic code path
1. Task name overwriting (
EnvGroup.__init__, lines 172-182)When the sub-env is itself an
EnvGroup, its dataset already has meaningful per-task names. The outerEnvGroupunconditionally destroys these and replaces them all with a single name.2. Silent fallback in
get_env_for_task(line 318-319)When the inner
EnvGroupreceives the outer name (which doesn't exist in itsenv_map), it silently falls back toenvs[0]. All rollouts go to the first sub-environment. No warning is logged.3. Scoring returns zeros for unknown tasks (
EnvGroupRubric.score_group, lines 86-94)The inner
EnvGroupRubriccan't find the outer name in itsenv_map, so all states getreward=0.0.Proposed fix
A. Preserve inner task names when sub-env is an EnvGroup
In
EnvGroup.__init__, when a sub-env is itself anEnvGroup, preserve its internal task names instead of overwriting them. Register each inner task name in the outerenv_mappointing to the sub-env:Same pattern for
eval_datasethandling.This way:
env_mapmaps each inner task name to the innerEnvGroupEnvGroup, inner routes to correct sub-envEnvGroupRubricalso gets the updatedenv_map, so scoring routes correctlyresults_df.task.nunique() > 1, enabling per-task logging in orchestrators like prime-rlB. Remove silent fallback in
get_env_for_taskThe current fallback to
envs[0]silently masks routing failures. An explicit error makes misconfigurations immediately visible.Test changes
1. Update
test_get_env_for_task— unknown task should raise, not fallback2. Add
test_nested_env_group_preserves_inner_tasks3. Add
test_nested_env_group_rubric_scoringReproduction
Any environment that returns an
EnvGroupfrom itsload_environment()and is registered as a single[[orchestrator.env]]in prime-rl will hit this. The inner task names get overwritten, routing breaks silently, and only the first sub-env's rubric ever scores anything.