Skip to content

EnvGroup nested in outer EnvGroup silently misroutes all tasks to envs[0] #1008

@swpo

Description

@swpo

Bug: Nested EnvGroup task routing is silently broken

When an EnvGroup is wrapped inside another EnvGroup (as prime-rl does when it wraps user envs), the outer group overwrites the task column to a single name. The inner EnvGroup then fails to route because its env_map uses the original task names — and silently falls back to envs[0] for rollouts, or returns reward=0.0 for scoring.

This means only the first sub-environment's rubric ever executes, and the user gets no error or warning that the rest of their tasks are being silently misrouted or scored as zero.

Impact

If a user builds a multi-task environment using EnvGroup (e.g., 17 tasks with different rubrics) and registers it as a single [[orchestrator.env]] in prime-rl, only the first task type gets meaningful reward signal. The model trains with zero reward on all other task types. There is no error or warning — the only symptom is that wandb metrics only show the first rubric's reward name.

Problematic code path

1. Task name overwriting (EnvGroup.__init__, lines 172-182)

for env, name in zip(self.envs, self.env_names):
    add_task = make_add_task_fn(name)
    env_dataset = env.build_dataset()
    if env_dataset is not None:
        if "task" in env_dataset.column_names:
            env_dataset = env_dataset.remove_columns(["task"])  # destroys inner task names
        env_dataset = env_dataset.map(add_task, **map_kwargs)   # overwrites with outer name

When the sub-env is itself an EnvGroup, its dataset already has meaningful per-task names. The outer EnvGroup unconditionally destroys these and replaces them all with a single name.

2. Silent fallback in get_env_for_task (line 318-319)

def get_env_for_task(self, task: str) -> vf.Environment:
    return self.env_map.get(task, self.envs[0])  # silent fallback!

When the inner EnvGroup receives the outer name (which doesn't exist in its env_map), it silently falls back to envs[0]. All rollouts go to the first sub-environment. No warning is logged.

3. Scoring returns zeros for unknown tasks (EnvGroupRubric.score_group, lines 86-94)

task = states[0].get("task", "default")
env = self.env_map.get(task)
if env is None:
    self.logger.warning(f"No environment found for task '{task}'")
    for state in states:
        state["reward"] = 0.0  # all tasks scored as zero

The inner EnvGroupRubric can't find the outer name in its env_map, so all states get reward=0.0.

Proposed fix

A. Preserve inner task names when sub-env is an EnvGroup

In EnvGroup.__init__, when a sub-env is itself an EnvGroup, preserve its internal task names instead of overwriting them. Register each inner task name in the outer env_map pointing to the sub-env:

for env, name in zip(self.envs, self.env_names):
    add_task = make_add_task_fn(name)
    env_dataset = env.build_dataset()
    if env_dataset is not None:
        if isinstance(env, EnvGroup) and "task" in env_dataset.column_names:
            # Preserve inner EnvGroup's task names for correct routing
            for inner_name in env.env_names:
                self.env_map[inner_name] = env
            # Don't overwrite task column — inner tasks are already set
        else:
            if "task" in env_dataset.column_names:
                env_dataset = env_dataset.remove_columns(["task"])
            env_dataset = env_dataset.map(add_task, **map_kwargs)
        datasets.append(env_dataset)

Same pattern for eval_dataset handling.

This way:

  • Inner task names are preserved in the dataset
  • Outer env_map maps each inner task name to the inner EnvGroup
  • Routing works: outer gets a task name, routes to inner EnvGroup, inner routes to correct sub-env
  • EnvGroupRubric also gets the updated env_map, so scoring routes correctly
  • results_df.task.nunique() > 1, enabling per-task logging in orchestrators like prime-rl

B. Remove silent fallback in get_env_for_task

def get_env_for_task(self, task: str) -> vf.Environment:
    env = self.env_map.get(task)
    if env is None:
        available = list(self.env_map.keys())
        raise ValueError(
            f"No environment found for task '{task}'. "
            f"Available tasks: {available}"
        )
    return env

The current fallback to envs[0] silently masks routing failures. An explicit error makes misconfigurations immediately visible.

Test changes

1. Update test_get_env_for_task — unknown task should raise, not fallback

def test_get_env_for_task(self, mock_openai_client):
    # ... (same setup as current) ...
    env_group = EnvGroup(envs=[env1, env2], env_names=["math", "code"])

    assert env_group.get_env_for_task("math") == env1
    assert env_group.get_env_for_task("code") == env2
    # Unknown task should raise, not silently fallback
    with pytest.raises(ValueError, match="No environment found for task"):
        env_group.get_env_for_task("unknown")

2. Add test_nested_env_group_preserves_inner_tasks

def test_nested_env_group_preserves_inner_tasks(self, mock_openai_client):
    """Test that wrapping an EnvGroup in another EnvGroup preserves inner task names."""
    env1 = SingleTurnEnv(
        client=mock_openai_client,
        model="test-model",
        dataset=Dataset.from_dict({"question": ["q1"], "answer": ["a1"]}),
        rubric=Rubric(),
    )
    env2 = SingleTurnEnv(
        client=mock_openai_client,
        model="test-model",
        dataset=Dataset.from_dict({"question": ["q2"], "answer": ["a2"]}),
        rubric=Rubric(),
    )

    inner_group = EnvGroup(envs=[env1, env2], env_names=["math", "code"])
    outer_group = EnvGroup(envs=[inner_group], env_names=["my_env"])

    # Inner task names should be preserved in the dataset
    dataset = outer_group.get_dataset()
    tasks = dataset["task"]
    assert "math" in tasks
    assert "code" in tasks
    assert "my_env" not in tasks

    # Routing should work through both levels
    assert outer_group.get_env_for_task("math") == inner_group
    assert outer_group.get_env_for_task("code") == inner_group

3. Add test_nested_env_group_rubric_scoring

@pytest.mark.asyncio
async def test_nested_env_group_rubric_scoring(self, mock_openai_client, make_input):
    """Test that scoring routes correctly through nested EnvGroups."""
    def math_reward(completion, **kwargs):
        return 0.8

    def code_reward(completion, **kwargs):
        return 0.6

    env1 = SingleTurnEnv(
        client=mock_openai_client,
        model="test-model",
        dataset=Dataset.from_dict({"question": ["q1"], "answer": ["a1"]}),
        rubric=Rubric(funcs=[math_reward], weights=[1.0]),
    )
    env2 = SingleTurnEnv(
        client=mock_openai_client,
        model="test-model",
        dataset=Dataset.from_dict({"question": ["q2"], "answer": ["a2"]}),
        rubric=Rubric(funcs=[code_reward], weights=[1.0]),
    )

    inner_group = EnvGroup(envs=[env1, env2], env_names=["math", "code"])
    outer_group = EnvGroup(envs=[inner_group], env_names=["my_env"])

    # Score a "code" task — should route through outer -> inner -> env2's rubric
    state = State(input=make_input(prompt="Test", answer="ans", task="code"))
    state["completion"] = "Test completion"
    state["trajectory"] = []
    state["timing"] = {"generation_ms": 0.0, "scoring_ms": 0.0, "total_ms": 0.0, "start_time": 0.0}
    state["is_completed"] = False
    state["stop_condition"] = None
    state["oai_tools"] = []
    state["reward"] = None
    state["metrics"] = None

    await outer_group.rubric.score_rollout(state)

    assert state["reward"] == 0.6  # code_reward, not math_reward
    assert state["metrics"]["code_reward"] == 0.6
    assert state["metrics"]["math_reward"] == 0.0

Reproduction

Any environment that returns an EnvGroup from its load_environment() and is registered as a single [[orchestrator.env]] in prime-rl will hit this. The inner task names get overwritten, routing breaks silently, and only the first sub-env's rubric ever scores anything.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions