Skip to content

Bug: Bridge mode Ralph-loop gating causes daemon to stall after no-change cycles #556

@wangsheng520520

Description

@wangsheng520520

Summary

When EVOLVE_BRIDGE=true and a cycle completes without producing file changes (no solidification needed), the daemon enters a permanent Ralph-loop. The last_solidify.run_id never catches up to last_run.run_id, causing isPendingSolidify() to return true indefinitely. The daemon sleeps 60s and retries forever — no subsequent cycles ever run.

Environment

  • Evolver version: v1.87.2
  • Mode: --loop daemon with EVOLVE_BRIDGE=true
  • Cycle timeout: default
  • Strategy: balanced

Root Cause Analysis

The gating loop in index.js (line 564-568):

const st0 = readJsonSafe(solidifyStatePath);
if (isPendingSolidify(st0)) {
  await sleepMs(Math.max(pendingSleepMs, minSleepMs));
  continue;
}

isPendingSolidify() (line 93-101):

function isPendingSolidify(state) {
  const lastRun = state && state.last_run ? state.last_run : null;
  const lastSolid = state && state.last_solidify ? state.last_solidify : null;
  if (!lastRun || !lastRun.run_id) return false;
  if (!lastSolid || !lastSolid.run_id) return true;
  return String(lastSolid.run_id) !== String(lastRun.run_id);
}

The chain of events:

  1. Daemon starts a Bridge cycle → calls evolve.run()
  2. evolve.run() spawns a sub-agent via sessions_spawn and immediately writes a new last_run to evolution_solidify_state.json
  3. The sub-agent is expected to call node index.js solidify on completion, which would update last_solidify
  4. Scenario A: Sub-agent produces "no changes detected" → does not call solidify → last_solidify stays at old value → Ralph-loop
  5. Scenario B: Daemon restarts (e.g., maxCyclesPerProcess threshold, or crash) before sub-agent completes → new daemon finds stale last_solidify → Ralph-loop

In the non-Bridge path (line 614-623):

if (String(process.env.EVOLVE_BRIDGE || '').toLowerCase() === 'false') {
  const stAfterRun = readJsonSafe(solidifyStatePath);
  if (isPendingSolidify(stAfterRun)) {
    const cleared = rejectPendingRun(solidifyStatePath);
    ...
  }
}

The auto-rejection code only runs when BRIDGE is disabled (=== 'false'). When Bridge is enabled, there's no safety net.

Steps to Reproduce

  1. Start daemon: node index.js --loop with EVOLVE_BRIDGE=true
  2. Let it run a Bridge cycle that spawns a sub-agent
  3. The sub-agent produces "no changes detected" or times out
  4. Observe: daemon enters Ralph-loop, last_run.run_id != last_solidify.run_id
  5. Observe: cycle_progress.json shows phase: sleep forever
  6. cycleCount in evolution_state.json stays stagnant

Workaround

# Auto-fix: sync last_solidify to match last_run
python3 -c "
import json
with open('memory/evolution/evolution_solidify_state.json') as f:
    d = json.load(f)
d['last_solidify'] = {'run_id': d['last_run']['run_id'], 'status': 'auto_solidified', 'reason': 'workaround'}
with open('memory/evolution/evolution_solidify_state.json', 'w') as f:
    json.dump(d, f, indent=2, ensure_ascii=False)
    f.write('\n')
"

Or use a cron guard: */2 * * * * <path>/evolver-ralph-guard.sh

Proposed Fix

In the Bridge-mode path, after evolve.run() completes (whether success, timeout, or error), the daemon should auto-solidify (or auto-reject) the pending run when no sub-agent result materializes. The existing rejectPendingRun() function (line 1062-1072) already handles this correctly — it just needs to be called regardless of the EVOLVE_BRIDGE setting.

Suggested fix at line 614 — remove the === 'false' guard so the auto-reject runs in all modes:

// Always auto-reject pending solidify after evolve.run() completes.
const stAfterRun = readJsonSafe(solidifyStatePath);
if (isPendingSolidify(stAfterRun)) {
  const cleared = rejectPendingRun(solidifyStatePath);
  if (cleared) {
    console.warn('[Loop] Auto-rejected pending run (sub-agent did not solidify).');
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions