tracking: result schema v2 sprint — #280 + related issues

## Summary

Sprint tracker for the result schema v2 work centered on #280 and its coupled issues. The schema v2 is the foundation — it defines the data contract that OTel export and evaluator enhancements consume.

## Sprint Plan

### Phase 1: Foundation (do first)

- [ ] #280 — Result schema v2: rename fields, add `input`, nest debug requests (2–3 days)

### Phase 2: Build on new schema (after #280 lands)

- [ ] #277 — OpenTelemetry trace export / OTLP/HTTP (3–5 days, consumes new schema directly)

### Phase 3: Parallel evaluator enhancements (independent, can run alongside Phase 1–2)

- [ ] #273 — Evaluator negation flag `negate: true` (1–2 days)
- [ ] #274 — Threshold aggregation strategy for composite evaluator (1–2 days)

### Independent (any time, no code dependencies)

- [ ] #278 — Agent eval layer taxonomy docs
- [ ] #279 — Autoevals/Braintrust scorer docs

### Deferred (next sprint, after schema stabilizes)

- #275 — Shared JSONL/CSV dataset compatibility with promptfoo
- #271 — promptfoo config import
- #272 — promptfoo config export
- #276 — promptfoo integration tracker

## Sequencing Rationale

- **#280 is the foundation** — renames `EvaluatorResult` → `AssertionResult`, `OutputMessage` → `Message`, `output_messages` → `output`, adds `input` field. Every downstream issue consumes these types.
- **#273/#274 are safe in parallel** — they add evaluator features that don't touch the renamed result schema fields. Trivial merge conflict on the type rename.
- **promptfoo deferred** — #275, #271, #272, #276 depend on the finalized schema but are lower priority this sprint. Queue for next sprint.

## Total Effort Estimate

~8–12 days of work across sprint issues, with phases 1–3 overlapping for a sprint duration of ~5–7 days.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracking: result schema v2 sprint — #280 + related issues #281

Summary

Sprint Plan

Phase 1: Foundation (do first)

Phase 2: Build on new schema (after #280 lands)

Phase 3: Parallel evaluator enhancements (independent, can run alongside Phase 1–2)

Independent (any time, no code dependencies)

Deferred (next sprint, after schema stabilizes)

Sequencing Rationale

Total Effort Estimate

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tracking: result schema v2 sprint — #280 + related issues #281

Description

Summary

Sprint Plan

Phase 1: Foundation (do first)

Phase 2: Build on new schema (after #280 lands)

Phase 3: Parallel evaluator enhancements (independent, can run alongside Phase 1–2)

Independent (any time, no code dependencies)

Deferred (next sprint, after schema stabilizes)

Sequencing Rationale

Total Effort Estimate

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions