Skip to content

Minimal multi-process support (fork/exec/waitpid/pipe)#816

Draft
wdcui wants to merge 23 commits intomainfrom
wdcui/multiproc-minimal
Draft

Minimal multi-process support (fork/exec/waitpid/pipe)#816
wdcui wants to merge 23 commits intomainfrom
wdcui/multiproc-minimal

Conversation

@wdcui
Copy link
Copy Markdown
Member

@wdcui wdcui commented Apr 26, 2026

Summary

Adds minimal multi-process support to litebox's Linux userland platform within a single host process. All forks use vfork semantics (parent suspended, child runs in shared address space until exec or exit), and child processes exec into their own VA partitions.

Key features

  • fork/vfork: do_clone detects fork (!CLONE_THREAD), suspends parent via VforkDone futex, child shares parent's address space until exec
  • exec detach: Child gets a new 1 TiB VA partition from VaPartitionAllocator (128 partitions), loads PIE binary there
  • waitpid/wait4: Blocking and non-blocking wait with support for specific PID, any child (-1), own process group (0), and specific process group (-pgid)
  • Pipes: Fork-aware FD lifecycle with fork_refcount on descriptor slots; pipe EOF detection works correctly across processes
  • Signals: Cross-process signal mailbox, SIGCHLD on child exit, SIGPIPE on EPIPE, kill() to pid/pgid
  • Process groups: setpgid, getpgid, getpgrp, setsid
  • Close-on-exec: FD_CLOEXEC honored during execve
  • Orphan reparenting: Children reparented to init on parent exit
  • PATH resolution: execve searches $PATH for bare binary names

Architecture

  • Core (litebox/): Platform-generic ProcessRegistry (parent/child tracking, exit status, process groups), AddressSpaceProvider trait, fork-aware FD refcounting (fork_refcount on IndividualEntry)
  • Platform (litebox_platform_linux_userland/): VaPartitionAllocator (128×1TiB partitions via bitmap)
  • Shim (litebox_shim_linux/): POSIX syscall implementations, cross-process signal mailbox, vfork parking, exec detach

Known limitations (minimal version)

  • VA partitions not reclaimed on process exit (128 available, sufficient for minimal use)
  • wait4 not interruptible by non-SIGCHLD signals
  • Only PIE (static-pie) binaries supported for fork children
  • Single host process only (no multi-host)
  • Vfork semantics: child shares parent address space before exec

Testing

  • test_fork_exec_wait: vfork + exec + waitpid, verify exit status
  • test_pipe_fork: pipe between two exec'd children (echo hello | cat)
  • test_kill_signal: cross-process kill + signal handling
  • test_waitpid_wnohang: non-blocking wait with WNOHANG
  • All existing tests pass (no regressions)

Review

3 rounds of 6-agent static analysis review (correctness ×2, security ×2, code quality ×2). All CRITICAL and HIGH issues fixed. See /workspace/docs/litebox/impl/log.md for full review log.

References

  • Design reference: origin/wdcui/agent-sandbox-poc (single-host multi-process, entangled with other changes)

wdcui added 17 commits April 26, 2026 06:41
Add single-host multi-process support for the Linux userland platform,
enabling piped command execution (e.g., echo hello | cat) within a
single host process. All forks use vfork semantics: parent suspends
while child runs in shared address space, child detaches to its own
VA partition on exec.

Key changes:
- ProcessRegistry for process lifecycle (create, exit, waitpid)
- AddressSpaceProvider trait + VA partition allocator (128x1TiB)
- GlobalState/ProcessState split (per-process PageManager)
- do_fork with vfork semantics and VforkDone futex signaling
- Exec detach to new address space before loading new binary
- Fork-aware FD close (Arc refcount prevents cross-process destruction)
- FD cleanup on process exit for proper pipe EOF detection
- vfork syscall handler, Wait4 syscall dispatch
- PIE-only children (dynamic ELF load hint via pm.addr_min())

Tested with fork+exec+waitpid and pipe-between-two-children (echo|cat).
No regressions in existing test suite.
- Fork-aware FD refcounting: fork_refcount on IndividualEntry tracks
  cross-process FD sharing; clone_for_fork creates independent OwnedFds
- Cross-process signal mailbox: BTreeMap-based per-PID mailboxes for
  delivering signals (SIGCHLD, kill) between processes
- Drain mailbox on return to userspace (prepare_to_run_guest,
  check_for_interrupt) — not just in waitpid
- SIGPIPE delivery on EPIPE for write, writev, sendto, sendmsg
- SignalState::clone_for_fork: deep-clone handlers, fresh shared_pending
- siginfo_chld: correctly decodes wait_status for CLD_EXITED vs CLD_KILLED
- Cross-process kill() routes to target's signal mailbox
Add raw_fds_matching_metadata() to RawDescriptorStorage to correctly
resolve raw FD numbers (not slot indices) matching per-FD metadata.
Called in sys_execve before detach_to_new_address_space.
ProcessRegistry::reparent() updates parent/child relationships and
returns zombie status so caller can deliver SIGCHLD to new parent.
Orphan handler in prepare_for_exit sends SIGCHLD to init for zombies.
- pgid/sid fields in ProcessContext, inherited by fork children
- setpgid with self-or-child constraint
- setsid checks process group leader (pgid == pid), not session leader
- kill(0, sig) sends to own process group
- kill(-pgid, sig) sends to specific process group
- pids_in_group() collects running processes in a group
resolve_path_lookup() extracts PATH from envp, tries each directory,
falls back to /usr/bin:/bin if PATH not set.
Returns ENOSYS with log_unsupported if do_fork is called from a
multi-threaded process, preventing undefined behavior.
try_wait now handles pid==0 (own process group), pid<-1 (specific
process group), in addition to existing pid>0 and pid==-1.
readv: check pipe FD once before the iov loop, then use a dedicated
pipe read path that avoids the double-mutable-borrow of kernel_buffer
that would occur inside run_on_raw_fd closures.

writev: route pipe FDs through write_to_iovec inside run_on_raw_fd.
Add three integration tests:
- test_kill_signal: vfork+exec sleeper, kill with SIGKILL, verify WIFSIGNALED
- test_waitpid_wnohang: poll with WNOHANG until child exits
- test_exec_path_lookup: execve bare name triggers PATH search in shim

Fix: move PATH resolution before shebang resolution in sys_execve.
Previously, resolve_shebang tried to open the bare name (e.g. 'exit_with')
which failed with ENOENT before PATH lookup could run.
…ndling hardening

- Add process count limit (128) to prevent fork bombs
- Cap signal mailbox at 256 entries (drop oldest on overflow)
- Return ECHILD for process group waits with no matching children
- detach_to_new_address_space returns Result instead of panicking on VA exhaustion
- PID overflow returns CreateProcessError instead of panicking
- exit_process is idempotent (no assert on double-exit)
- Fork failure cleanup: remove zombie registry entry after spawn failure
- Store address_space_id on ProcessState for future VA partition reclamation
- Defer VA partition reclamation (128 partitions sufficient for minimal version)
- Fix CRITICAL wait4 futex race: snapshot exit epoch BEFORE try_wait
  to prevent missed wakeups when child exits between check and block
- Fix PID/TID namespace collision: advance_next_pid after thread
  creation, saturating_add for child_pid+1 overflow
- Remove duplicate unused ProcessRegistry from LiteBox core
- Remove dead exit_epoch field from RegistryInner
- Fix siginfo_chld: use (wait_status & 0x7f)==0 instead of fragile
  trailing_zeros heuristic for normal exit detection
- Replace unwrap/expect with Result propagation in fork punchthrough
- Fix try_wait doc comment to match actual return type semantics
…k ordering

- Fix HIGH: signal vfork_done before returning error when
  detach_to_new_address_space fails during exec, preventing parent hang
- Fix MEDIUM: remove redundant close-on-exec pass in sys_execve
  (close_on_exec() already handles it; second pass could double-decrement
  fork_refcount)
- Fix MEDIUM: add debug_assert that reaped zombie has no children in
  try_wait (children should have been reparented during exit_process)
- Fix MEDIUM: use unsigned_abs() instead of (-t).cast_unsigned() in
  try_wait to avoid i32::MIN overflow
- Fix MEDIUM: clone Arc from signal_mailboxes map and drop outer lock
  before acquiring per-mailbox lock to prevent nested lock acquisition
- Fix LOW: simplify redundant match arm in try_wait (t > 0 case)
Fix clippy lints across litebox core, shim, and runner crates:
- similar_names: allow on create_process and sys_setpgid
- question_mark: use ? operator in reparent
- match_wildcard_for_single_variants: explicit ProcessState::Running
- collapsible_if: use let-chains (edition 2024)
- unnecessary_map_or: use is_none_or
- verbose_bit_mask: use trailing_zeros
- items_after_statements: move const before let bindings
- unnecessary boolean not: invert if/else branches
- dead_code: allow on compile_static_pie (unused in loader test)
- Fix cargo fmt formatting in process.rs
- Replace unstable if-let match guard with nested match (E0658 on SNP/LVBS)
- Add stub AddressSpaceProvider impl for WindowsUserland platform
…, test PID mismatch

- Use full path for AddressSpaceKind in Windows platform impl
- Remove needless let bindings in net.rs and unix.rs (clippy::let_and_return)
- Fix test_syscall_rewriter: override PID to 1 to match process registry
- Fix cargo fmt formatting
… test PID

- exit_process returns None instead of panicking when process not found
- Override PID to 1 in Windows runner test helper (matches process registry)
- Fixes test_stdio, test_syscall_rewriter, and Windows loader test panics
@jaybosamiya-ms jaybosamiya-ms added the expmt:shadow-kiln Tag to quickly find the different PRs as part of the "shadow kiln" experiment. label Apr 28, 2026
wdcui added 6 commits May 1, 2026 16:01
…nd unused clone_table

- fork_refcount → process_refcount
- on_dup → on_ref_added
- ForkDecremented → SharedDecremented
- clone_for_fork → clone_for_child_selective(Option<&[usize]>)
  None = inherit all (bulk), Some = selective (NT-style)
- increment_fork_refcounts → increment_process_refcounts
- Remove unused Descriptors::clone_table
Single Descriptors::clone_storage_for_child method combines FD storage
cloning with refcount bookkeeping — impossible to misuse by forgetting
to increment refcounts after cloning.
Single method on Descriptors handles cloning + refcount increment in one
pass. Removes the intermediate clone_for_child_selective from
RawDescriptorStorage — no need for two methods when there is one caller.
The method naturally belongs on the object being cloned (the per-process
FD table), not on Descriptors. Takes &mut Descriptors as a parameter for
refcount bookkeeping.
…dSubsystemEntry

No subsystem overrides these hooks — they were all default no-ops.
process_refcount on IndividualEntry handles cross-process sharing,
and Arc::strong_count handles within-process dup sharing. The hooks
added complexity without value; they can be re-added if a subsystem
actually needs them.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

🤖 SemverChecks 🤖 ⚠️ Potential breaking API changes detected ⚠️

Click for details
--- failure trait_added_supertrait: non-sealed trait added new supertraits ---

Description:
A non-sealed trait added one or more supertraits, which breaks downstream implementations of the trait
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#generic-bounds-tighten
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/trait_added_supertrait.ron

Failed in:
  trait litebox::platform::Provider gained AddressSpaceProvider in file /home/runner/work/litebox/litebox/litebox/src/platform/mod.rs:30

--- failure enum_no_repr_variant_discriminant_changed: enum variant had its discriminant change value ---

Description:
The enum's variant had its discriminant value change. This breaks downstream code that used its value via a numeric cast like `as isize`.
        ref: https://doc.rust-lang.org/reference/items/enumerations.html#assigning-discriminant-values
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/enum_no_repr_variant_discriminant_changed.ron

Failed in:
  variant SyscallRequest::Getuid 79 -> 83 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2114
  variant SyscallRequest::Geteuid 80 -> 84 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2115
  variant SyscallRequest::Getgid 81 -> 85 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2116
  variant SyscallRequest::Getegid 82 -> 86 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2117
  variant SyscallRequest::Sysinfo 83 -> 87 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2118
  variant SyscallRequest::CapGet 84 -> 88 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2121
  variant SyscallRequest::GetDirent64 85 -> 89 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2125
  variant SyscallRequest::SchedGetAffinity 86 -> 90 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2130
  variant SyscallRequest::SchedYield 87 -> 91 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2135
  variant SyscallRequest::Futex 88 -> 92 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2136
  variant SyscallRequest::Execve 89 -> 93 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2139
  variant SyscallRequest::Umask 90 -> 94 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2144
  variant SyscallRequest::Prctl 91 -> 95 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2147
  variant SyscallRequest::Alarm 92 -> 96 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2150
  variant SyscallRequest::SetITimer 93 -> 97 in /home/runner/work/litebox/litebox/litebox_common_linux/src/lib.rs:2153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

expmt:shadow-kiln Tag to quickly find the different PRs as part of the "shadow kiln" experiment.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants