Skip to content

feat: promptfoo config import (promptfooconfig.yaml → EVAL.yaml) #271

@christso

Description

@christso

Summary

Add agentv import promptfoo command that converts promptfooconfig.yaml into EVAL.yaml, enabling users with existing promptfoo test suites to adopt AgentV without rewriting their evaluations.

Motivation

promptfoo is the largest open-source LLM eval tool (6,400+ stars, 10M+ claimed users). Supporting their format gives AgentV instant access to the largest eval community and reduces adoption friction. Users shouldn't have to choose — they should be able to use both tools on the same test suites.

Research reference: integration-assessment-promptfoo-braintrust.md

Core Assertion Type Mapping

promptfoo assertion AgentEvals evaluator Notes
llm-rubric llm_judge (freeform mode) Direct mapping
factuality llm_judge (rubric mode, factuality prompt) Prompt template differs
g-eval llm_judge (rubric mode, CoT) G-Eval is CoT-enhanced rubric
contains / icontains field_accuracy (contains mode) Case sensitivity flag
equals field_accuracy (exact mode) Direct mapping
regex field_accuracy (regex mode) Direct mapping
is-json field_accuracy (json_valid mode) Schema validation optional
similar field_accuracy (semantic mode) Embedding-based
tool-call-f1 tool_trajectory (any_order mode) F1 vs match semantics differ
cost / latency execution_metrics Direct mapping
javascript / python code_judge Language flag
context-faithfulness llm_judge (faithfulness prompt) RAG-specific
context-recall llm_judge (recall prompt) RAG-specific

What Doesn't Map Cleanly

CLI Interface

# Convert a promptfoo config to EVAL.yaml
agentv import promptfoo ./promptfooconfig.yaml

# Convert with output path
agentv import promptfoo ./promptfooconfig.yaml -o ./evals/EVAL.yaml

# Dry run — show mapping without writing
agentv import promptfoo ./promptfooconfig.yaml --dry-run

Acceptance Criteria

  • Parses promptfooconfig.yaml (YAML and JSON variants)
  • Maps the top 15 most-used assertion types to AgentEvals evaluators
  • Converts inline test cases (tests: section)
  • Handles file:// references for external test data (CSV, JSONL, YAML)
  • Converts defaultTest to shared evaluator config
  • Preserves description and vars metadata
  • Unmappable assertions converted to code_judge with a comment noting the original type
  • --dry-run flag shows the mapping without writing files
  • Integration tests with real promptfoo config examples

Effort Estimate

3-5 days

Design Principle

Format conversion only — no runtime dependency on promptfoo. AgentV parses the YAML itself and maps to its own types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions