docs: add test strategy document by planetf1 · Pull Request #1167 · generative-computing/mellea

planetf1 · 2026-05-27T07:39:03Z

Summary

Closes #853 (parent epic #726).

Why

We have test/MARKERS_GUIDE.md as a marker reference and test/README.md for operational notes, but a new contributor landing on the repo has no single place that explains the strategy: what each tier means, how to decide between them, what runs in CI, how coverage works, where qualitative tests sit. They have to piece it together from pyproject.toml, the workflow YAML, three docs, and the conftest.

This PR adds docs/docs/community/testing-strategy.md as that missing top-level document. It is structured to be progressive: a short "What you need to do" action section first (first-time setup, the daily commands, a 5-step procedure for adding a test, the pre-PR checklist), then the philosophy and decision rules, per-tier definitions, authoring guidance, local workflow with explicit scoping axes, CI tiers, coverage, and finally advanced reference material on backend gating. A reader can stop wherever they have what they need rather than read the whole thing.

While writing it I also closed the broken anchor link from how-to/unit-test-generative-code.md, which has pointed at a non-existent community/contributing-guide#testing for some time. That link now resolves.

What's in scope

New page at docs/docs/community/testing-strategy.md. Sections: What you need to do, Philosophy, Classification (with backend overview, per-tier definitions, deprecated llm marker), Authoring guide, Local dev workflow (with scoping-axes table), CI pipeline (Pre-commit / PR CI / Nightly / planned tiers), Coverage, Examples as tests, Backend & resource gating.
docs.json updated to register the new page in the Community group.
how-to/unit-test-generative-code.md retargeted at the new page.
AGENTS.md §3 and §10, CONTRIBUTING.md Testing section, test/MARKERS_GUIDE.md and test/README.md all cross-link to the new page rather than duplicate strategy material. MARKERS_GUIDE.md stays the marker reference.
Several accuracy fixes uncovered while writing: corrected backend method names in mock-discipline guidance (generate_from_chat_context / generate_from_raw, not the previously-stated generate/astream); fixed CONTRIBUTING.md CI replication command (was CICD=1 uv run pytest but PR CI runs pytest test).

CI tier accuracy

This was the hardest part to get right. The doc now reflects what actually runs:

Pre-commit, PR CI — fully implemented (GitHub Actions, quality.yml).
Nightly — runs today on an IBM internal LSF cluster, orchestrated outside this repo by a nightly.py driver that invokes test/scripts/run_tests_with_ollama_and_vllm.sh --group-by-backend. Failures auto-file a dated issue (e.g. Failing nightlies in 2026-04-30-0f95f84 #985 was the most recent).
On-demand nightly for a PR — open as test: on-demand nightly test runs for PRs #734 but not currently being progressed; the doc says so honestly rather than pretending it's tracked.
Pre-release — no specific issue, parent epic Epic: Testing Infrastructure & Strategy Overhaul #726 only.

What's deliberately not here

Coverage uploads / trend reporting — open gap (test: test results and coverage reporting #737, also stale). Doc explains coverage runs but reports aren't currently surfaced.
MARKERS_GUIDE.md rewrite or consolidation — it stays as the marker reference. Strategy doc covers principles; marker guide covers the lookup table.
Reconciling root CONTRIBUTING.md with Mintlify community/contributing-guide.md — out of scope; separate cleanup.

Where this fits in Epic #726

This PR closes work item #853 (test strategy document) under the parent epic. The doc is honest about what's implemented today versus what's planned, and points at the relevant child issues only where they're meaningfully tracking active work.

Testing

markdownlint clean (uv run pre-commit run markdownlint --files docs/docs/community/testing-strategy.md)
codespell clean
All cross-references verified — broken community/contributing-guide#testing anchor and stale #ci-pipeline-tiers anchor both fixed
Cross-context links use the existing AGENTS.md pattern (relative path to source markdown for source-tree files; absolute GitHub URLs from the Mintlify page to source files)
Technical claims verified against source: backend method names checked against mellea/backends/ollama.py; conftest behaviour and capability caching checked against test/conftest.py; CI behaviour checked against .github/workflows/quality.yml; pytest.ini_options claims checked against pyproject.toml; nightly script confirmed in test/scripts/run_tests_with_ollama_and_vllm.sh
Build the Mintlify site to confirm rendering (will run in docs-publish CI on merge)

Assisted-by: Claude Code

Adds docs/docs/community/testing-strategy.md — the contributor guide covering philosophy, classification rules, authoring guide, CI tier map, local workflow, examples-as-tests opt-in, and backend/resource gating. Closes the pre-existing broken link from unit-test-generative-code to community/contributing-guide#testing. MARKERS_GUIDE.md is kept as the detailed marker reference and cross-linked from the new page. Updates: docs.json (add page to Community group), AGENTS.md §3 and §10 (pointer to strategy doc), CONTRIBUTING.md (Test Markers section), test/README.md (remove duplicated predicates table), test/MARKERS_GUIDE.md (add "See also" header). Closes generative-computing#853 Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Rewrites testing-strategy.md following a cold read as an outsider: - Adds backend intro table so 'why ollama vs huggingface matters for testing' is explained upfront rather than assumed knowledge - Expands tier definitions (unit/integration/e2e/qualitative) with positive/negative indicators and explicit decision rules; qualitative is now properly framed as a sub-tier of e2e - Adds fixture reference table (session, mock_backend, gh_run, etc.) - Adds coverage section (htmlcov/, branch coverage, no enforced threshold) - Fixes CI accuracy: PR CI runs 'pytest test' not 'pytest' — docs/examples are not collected in CI - Fixes '# pytest:' comment description — takes marker names, not -m expressions - Removes predicate table and auto-detection table (defer to MARKERS_GUIDE to avoid maintaining two identical tables); keeps the usage code example - Clarifies qualitative tests run locally by default, skipped in CI only CONTRIBUTING.md: trims the Testing Quick Reference to essential setup commands + pointer to strategy doc; fixes CI command (test was missing the 'test' path argument); removes duplicate tier/marker/workflow content. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Comprehensive revision of testing-strategy.md following a cold-read review: Accuracy fixes: - Remove fabricated fixture table (session/mock_backend/clean_metrics_env/ plugins are per-file, not global conftest; only gh_run and system_capabilities are confirmed global fixtures) - Fix coverage claim: reports run in CI but are discarded — issue generative-computing#737 tracks surfacing them; CI job summary is JUnit counts only, not coverage - Fix 'open htmlcov/' to show both macOS and Linux commands Clarity fixes: - Rename test_<unit>_... naming convention to test_<subject>_... (unit collided with the tier name) - Integration positive indicators were exclusively OTel; added component- wiring example so contributors outside telemetry can recognise their tests - Qualitative code example no longer repeats the e2e test from the previous section - Philosophy: clarify all tiers run locally by default, not just qualitative - Add note about generative-computing#729 e2e→integration migration for existing tests Structural (progressive): - Move 'About backends' inside Classification where it's needed - Move 'Local dev workflow' before CI pipeline (contributors need commands before they need to understand automation) - Move Coverage after CI (it is CI output) - Reorder so readers can stop earlier: Philosophy → Classification → Authoring → Local dev → CI → Coverage → Advanced Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Final review fixes: - Fix backend method names in mock discipline: 'generate'/'astream' do not exist on backends; actual public methods are 'generate_from_chat_context' and 'generate_from_raw' - Fix qualitative code example: 'result' was undefined (left over from trimming the duplicated e2e test). Now shows a single self-contained qualitative example with an explanatory comment about why the assertion is non-deterministic - LiteLLM described as 'unified Python client' rather than 'proxy' (the proxy form exists but is not how Mellea uses LiteLLM) - Reword 'SDK boundary assertions' (jargon) to 'assertions on what an external SDK actually emitted' in the per-tier table - Remove redundant 'Auto-applied unit marker' subsection — the same fact is stated in Philosophy, Classification > Unit, and Authoring > Markers Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Reduce em-dash count from 38 to 24 by substituting commas, colons, semicolons, full stops, and parentheses where they read as naturally or better. Em dashes remain where they're genuinely the right call (sharp asides, table-cell separators). Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

The CI table previously marked nightly and pre-release as 'planned'. Nightlies actually run today on the Bluevela LSF cluster, orchestrated by an external nightly.py driver that invokes test/scripts/run_tests_with_ollama_and_vllm.sh --group-by-backend with Ollama and vLLM. Failures auto-file an issue (see generative-computing#985 for the format). Updates: - CI table now distinguishes Pre-commit / PR CI (GitHub Actions) / Nightly (Bluevela, scheduled, exists) / On-demand nightly (generative-computing#734, planned) / Pre-release (planned) / Local dev - Adds 'Nightly in detail' section explaining the script, models, and --group-by-backend memory-fragmentation rationale - Adds 'Scoping a test run' subsection with a four-axis table (tier / backend / compound expression / path) and a runtime resource-gating note, so contributors can see how to slice runs without reading every later section - Common command lines now includes the local nightly-style invocation Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

The doc explained concepts well but a developer wanting to actually do something had to synthesise from four different sections. Adds a short action-oriented section near the top covering: - First-time setup (pointer to contributing-guide + the two essential Ollama commands) - Running tests during development (three commands) - Adding a new test (5-step procedure with links into Classification, Markers, and the rest of the doc for detail) - Before opening a PR (the two commands CI runs, plus a pointer that GPU-backed paths only run in nightly so changes there should be validated via maintainer before merge) Progressive structure preserved: this is a quick-reference top section, the deeper concept sections below it remain authoritative. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

Three follow-ups uncovered while validating the doc: Stale issue references: - generative-computing#734 (on-demand nightlies) and generative-computing#737 (coverage reporting) last saw activity 2026-03-24, over two months ago. Saying 'tracked in generative-computing#734' implies active work. Reword to 'open as #N but not currently being worked on' so contributors aren't misled into expecting progress. - Pre-release tier had no specific tracker; just point at parent epic generative-computing#726. Cross-context links: - 'https://generative-computing.github.io/mellea/community/testing-strategy' was used in test/MARKERS_GUIDE.md, test/README.md, and CONTRIBUTING.md but the URL returns 404 — the canonical Mintlify domain isn't actually configured there. Replace with relative path to the source markdown, matching the pattern already used in AGENTS.md. Source-tree files read on GitHub get a working link in PR review and after merge. - CONTRIBUTING.md anchor '#ci-pipeline-tiers' was stale (renamed to 'CI pipeline' / '#ci-pipeline'). Bluevela rename: - Bluevela is an internal IBM environment, not mentioned anywhere else in the public repo. Replace with neutral 'IBM internal LSF cluster' so the published docs don't expose internal infra naming. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

The audit-markers skill listed test/MARKERS_GUIDE.md as the authoritative source for marker conventions. Now that the test strategy doc owns classification decision rules and per-tier definitions, add it to the skill's Project References (above MARKERS_GUIDE.md) so the agent reads it for the 'is this unit / integration / e2e / qualitative?' question, then drops to MARKERS_GUIDE.md for marker mechanics. Assisted-by: Claude Code Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>

jakelorocco

I think the changes look good / the documentation is accurate.

I don't quite know how I feel about this document being something that is hosted with the Mellea documentation but will default to others on that decision. Typically when I contribute to a repository, I expect the documentation for testing, etc... to be a regular *.md file with instructions, not a community doc.

Maybe we should start linking to some of these pages from our PR template? That way users don't have to go out of their way to figure out how to get their PRs in good shape?

jakelorocco · 2026-05-27T13:22:19Z

Should we remove the testing section from https://docs.mellea.ai/community/contributing-guide#testing as well then? This info seems to duplicate most of that.

planetf1 added 5 commits May 27, 2026 08:07

github-actions Bot added the documentation Improvements or additions to documentation label May 27, 2026

planetf1 added 4 commits May 27, 2026 08:46

planetf1 marked this pull request as ready for review May 27, 2026 08:01

planetf1 requested a review from a team as a code owner May 27, 2026 08:01

planetf1 requested review from jakelorocco and nrfulton May 27, 2026 08:01

jakelorocco reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add test strategy document#1167

docs: add test strategy document#1167
planetf1 wants to merge 9 commits into
generative-computing:mainfrom
planetf1:worktree-issue-853

planetf1 commented May 27, 2026 •

edited

Loading

Uh oh!

jakelorocco left a comment

Uh oh!

jakelorocco May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

planetf1 commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What's in scope

CI tier accuracy

What's deliberately not here

Where this fits in Epic #726

Testing

Uh oh!

jakelorocco left a comment

Choose a reason for hiding this comment

Uh oh!

jakelorocco May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

planetf1 commented May 27, 2026 •

edited

Loading