-
Notifications
You must be signed in to change notification settings - Fork 5
Feature Request: Structured Data Retrieval During Prompt Execution #188
Description
Feature Request: Structured Data Retrieval During Prompt Execution
Summary
Currently PromptKit allows the LLM a wide degree of freedom in how it retrieves additional data during prompt execution and reasoning.
In practice this means the model is implicitly deciding:
- what additional context to retrieve
- when to retrieve it
- from where to retrieve it
- how much of it to retrieve
- and how to incorporate it into the reasoning chain
While this flexibility is useful for exploratory tasks, it leads to prompt executions that are:
- non-deterministic
- difficult to audit
- difficult to reproduce
- difficult to reason about post-hoc
- sensitive to model implementation details
This becomes particularly problematic for engineering workflows (requirements extraction, protocol validation, maintenance audits, corpus sanitation, etc.) where prompt execution needs to be explainable and repeatable.
Problem Statement
At present, data retrieval during prompt execution is effectively:
model-directed rather than workflow-directed
The LLM is free to:
- implicitly expand context
- decide when additional data is required
- select sources from repository state or prior context
- retrieve information in a non-observable way
- alter retrieval strategy across executions
As a result:
| Property | Current Behavior |
|---|---|
| Retrieval intent | Implicit |
| Retrieval timing | Model-decided |
| Retrieval scope | Model-decided |
| Observability | Low |
| Determinism | Low |
| Auditability | Low |
| Reproducibility | Low |
This introduces "vibey" prompt execution where reasoning outcomes may change without any change to:
- the prompt
- the repository state
- the workflow definition
but only due to differences in retrieval decisions made by the model.
Proposed Direction
PromptKit should introduce structured data retrieval as a first-class execution phase within prompt workflows.
Instead of allowing the LLM to freely pull context during reasoning, workflows should be able to:
- Declare retrieval requirements up-front
- Specify retrieval sources explicitly
- Define retrieval scope deterministically
- Bind retrieved context into the execution pipeline prior to reasoning
Conceptually:
Current:
LLM
└── reasoning
└── implicit retrieval (model decides)
Proposed:
Workflow
├── Retrieval Phase (declared)
├── Context Binding
└── Reasoning Phase (bounded)
Functional Requirements
PromptKit SHOULD allow workflow authors to:
- Declare required data inputs as part of workflow definition
- Specify retrieval source(s) (e.g. repo files, prompt corpus, protocol docs)
- Define retrieval scope (paths, tags, taxonomy classes, etc.)
- Execute retrieval prior to reasoning phase
- Bind retrieved context into composed prompt deterministically
- Prevent unstructured context expansion during reasoning phase
Non-Goals
This proposal does NOT aim to:
- eliminate model-driven reasoning
- eliminate dynamic workflow selection
- prevent exploratory execution modes
Instead it aims to:
separate data acquisition from reasoning
so that prompt execution becomes:
- reproducible
- inspectable
- explainable
- engineering-grade
Example Use Cases
This would improve determinism for:
- corpus audit workflows
- maintenance prompts
- protocol validation
- taxonomy sanitation
- requirements extraction
- template consistency analysis
where consistent outcomes across executions are required.
Motivation
PromptKit positions prompts as structured engineering artifacts.
Allowing implicit, model-directed data retrieval during execution undermines:
- workflow predictability
- execution traceability
- CI-driven prompt validation
- long-term maintenance workflows
Structured retrieval would make prompt execution behave more like:
a deterministic pipeline
rather than
a best-effort interactive reasoning session
Open Questions
- Should retrieval policies be declared at:
- persona level?
- protocol level?
- workflow level?
- Should workflows be able to restrict additional retrieval during reasoning?
- Should retrieval plans be surfaced in execution logs?
Acceptance Criteria
A PromptKit workflow can:
- declare required retrieval inputs
- execute retrieval deterministically prior to reasoning
- bind retrieved context into prompt assembly
- produce consistent execution results across runs given identical inputs