-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Add agentv import promptfoo command that converts promptfooconfig.yaml into EVAL.yaml, enabling users with existing promptfoo test suites to adopt AgentV without rewriting their evaluations.
Motivation
promptfoo is the largest open-source LLM eval tool (6,400+ stars, 10M+ claimed users). Supporting their format gives AgentV instant access to the largest eval community and reduces adoption friction. Users shouldn't have to choose — they should be able to use both tools on the same test suites.
Research reference: integration-assessment-promptfoo-braintrust.md
Core Assertion Type Mapping
| promptfoo assertion | AgentEvals evaluator | Notes |
|---|---|---|
llm-rubric |
llm_judge (freeform mode) |
Direct mapping |
factuality |
llm_judge (rubric mode, factuality prompt) |
Prompt template differs |
g-eval |
llm_judge (rubric mode, CoT) |
G-Eval is CoT-enhanced rubric |
contains / icontains |
field_accuracy (contains mode) |
Case sensitivity flag |
equals |
field_accuracy (exact mode) |
Direct mapping |
regex |
field_accuracy (regex mode) |
Direct mapping |
is-json |
field_accuracy (json_valid mode) |
Schema validation optional |
similar |
field_accuracy (semantic mode) |
Embedding-based |
tool-call-f1 |
tool_trajectory (any_order mode) |
F1 vs match semantics differ |
cost / latency |
execution_metrics |
Direct mapping |
javascript / python |
code_judge |
Language flag |
context-faithfulness |
llm_judge (faithfulness prompt) |
RAG-specific |
context-recall |
llm_judge (recall prompt) |
RAG-specific |
What Doesn't Map Cleanly
assert-setwith threshold logic — see feat: assert-set evaluator with threshold logic (N% must pass) #235 (assert-set evaluator)not-prefix negation — needsnegate: trueflag (see feat: promptfoo config import (promptfooconfig.yaml → EVAL.yaml) #271)- Combinatorial variable expansion — keep as promptfoo-specific, document that users pre-expand
- Matrix evaluation (prompts × providers × tests) — different paradigm, AgentV evaluates single target
CLI Interface
# Convert a promptfoo config to EVAL.yaml
agentv import promptfoo ./promptfooconfig.yaml
# Convert with output path
agentv import promptfoo ./promptfooconfig.yaml -o ./evals/EVAL.yaml
# Dry run — show mapping without writing
agentv import promptfoo ./promptfooconfig.yaml --dry-runAcceptance Criteria
- Parses
promptfooconfig.yaml(YAML and JSON variants) - Maps the top 15 most-used assertion types to AgentEvals evaluators
- Converts inline test cases (
tests:section) - Handles
file://references for external test data (CSV, JSONL, YAML) - Converts
defaultTestto shared evaluator config - Preserves
descriptionandvarsmetadata - Unmappable assertions converted to
code_judgewith a comment noting the original type -
--dry-runflag shows the mapping without writing files - Integration tests with real promptfoo config examples
Effort Estimate
3-5 days
Design Principle
Format conversion only — no runtime dependency on promptfoo. AgentV parses the YAML itself and maps to its own types.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request