Skip to content

Latest commit

 

History

History
38 lines (28 loc) · 1.06 KB

File metadata and controls

38 lines (28 loc) · 1.06 KB
title status authors based_on category source tags
Code-Then-Execute Pattern
emerging
Nikola Balic (@nibzard)
DeepMind CaMeL (orig.)
Luca Beurer-Kellner et al. (2025)
Tool Use & Environment
dsl
sandbox
program-synthesis
auditability

Problem

Plan lists are opaque; we want full data-flow analysis and taint tracking.

Solution

Have the LLM output a sandboxed program or DSL script:

  1. LLM writes code that calls tools and untrusted-data processors.
  2. Static checker/Taint engine verifies flows (e.g., no tainted var to send_email.recipient).
  3. Interpreter runs the code in a locked sandbox.
x = calendar.read(today)
y = QuarantineLLM.format(x)
email.write(to="john@acme.com", body=y)

How to use it

Complex multi-step agents like SQL copilots, software-engineering bots.

Trade-offs

  • Pros: Formal verifiability; replay logs.
  • Cons: Requires DSL design and static-analysis infra.

References

  • Debenedetti et al., CaMeL (2025); Beurer-Kellner et al., §3.1 (5).