Skip to content

docs: add test strategy document#1167

Open
planetf1 wants to merge 9 commits into
generative-computing:mainfrom
planetf1:worktree-issue-853
Open

docs: add test strategy document#1167
planetf1 wants to merge 9 commits into
generative-computing:mainfrom
planetf1:worktree-issue-853

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented May 27, 2026

Summary

Closes #853 (parent epic #726).

Why

We have test/MARKERS_GUIDE.md as a marker reference and test/README.md for operational notes, but a new contributor landing on the repo has no single place that explains the strategy: what each tier means, how to decide between them, what runs in CI, how coverage works, where qualitative tests sit. They have to piece it together from pyproject.toml, the workflow YAML, three docs, and the conftest.

This PR adds docs/docs/community/testing-strategy.md as that missing top-level document. It is structured to be progressive: a short "What you need to do" action section first (first-time setup, the daily commands, a 5-step procedure for adding a test, the pre-PR checklist), then the philosophy and decision rules, per-tier definitions, authoring guidance, local workflow with explicit scoping axes, CI tiers, coverage, and finally advanced reference material on backend gating. A reader can stop wherever they have what they need rather than read the whole thing.

While writing it I also closed the broken anchor link from how-to/unit-test-generative-code.md, which has pointed at a non-existent community/contributing-guide#testing for some time. That link now resolves.

What's in scope

  • New page at docs/docs/community/testing-strategy.md. Sections: What you need to do, Philosophy, Classification (with backend overview, per-tier definitions, deprecated llm marker), Authoring guide, Local dev workflow (with scoping-axes table), CI pipeline (Pre-commit / PR CI / Nightly / planned tiers), Coverage, Examples as tests, Backend & resource gating.
  • docs.json updated to register the new page in the Community group.
  • how-to/unit-test-generative-code.md retargeted at the new page.
  • AGENTS.md §3 and §10, CONTRIBUTING.md Testing section, test/MARKERS_GUIDE.md and test/README.md all cross-link to the new page rather than duplicate strategy material. MARKERS_GUIDE.md stays the marker reference.
  • Several accuracy fixes uncovered while writing: corrected backend method names in mock-discipline guidance (generate_from_chat_context / generate_from_raw, not the previously-stated generate/astream); fixed CONTRIBUTING.md CI replication command (was CICD=1 uv run pytest but PR CI runs pytest test).

CI tier accuracy

This was the hardest part to get right. The doc now reflects what actually runs:

What's deliberately not here

  • Coverage uploads / trend reporting — open gap (test: test results and coverage reporting #737, also stale). Doc explains coverage runs but reports aren't currently surfaced.
  • MARKERS_GUIDE.md rewrite or consolidation — it stays as the marker reference. Strategy doc covers principles; marker guide covers the lookup table.
  • Reconciling root CONTRIBUTING.md with Mintlify community/contributing-guide.md — out of scope; separate cleanup.

Where this fits in Epic #726

This PR closes work item #853 (test strategy document) under the parent epic. The doc is honest about what's implemented today versus what's planned, and points at the relevant child issues only where they're meaningfully tracking active work.

Testing

  • markdownlint clean (uv run pre-commit run markdownlint --files docs/docs/community/testing-strategy.md)
  • codespell clean
  • All cross-references verified — broken community/contributing-guide#testing anchor and stale #ci-pipeline-tiers anchor both fixed
  • Cross-context links use the existing AGENTS.md pattern (relative path to source markdown for source-tree files; absolute GitHub URLs from the Mintlify page to source files)
  • Technical claims verified against source: backend method names checked against mellea/backends/ollama.py; conftest behaviour and capability caching checked against test/conftest.py; CI behaviour checked against .github/workflows/quality.yml; pytest.ini_options claims checked against pyproject.toml; nightly script confirmed in test/scripts/run_tests_with_ollama_and_vllm.sh
  • Build the Mintlify site to confirm rendering (will run in docs-publish CI on merge)

Assisted-by: Claude Code

planetf1 added 5 commits May 27, 2026 08:07
Adds docs/docs/community/testing-strategy.md — the contributor guide
covering philosophy, classification rules, authoring guide, CI tier
map, local workflow, examples-as-tests opt-in, and backend/resource
gating. Closes the pre-existing broken link from unit-test-generative-code
to community/contributing-guide#testing. MARKERS_GUIDE.md is kept as the
detailed marker reference and cross-linked from the new page.

Updates: docs.json (add page to Community group), AGENTS.md §3 and §10
(pointer to strategy doc), CONTRIBUTING.md (Test Markers section),
test/README.md (remove duplicated predicates table), test/MARKERS_GUIDE.md
(add "See also" header).

Closes generative-computing#853

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Rewrites testing-strategy.md following a cold read as an outsider:
- Adds backend intro table so 'why ollama vs huggingface matters for
  testing' is explained upfront rather than assumed knowledge
- Expands tier definitions (unit/integration/e2e/qualitative) with
  positive/negative indicators and explicit decision rules; qualitative
  is now properly framed as a sub-tier of e2e
- Adds fixture reference table (session, mock_backend, gh_run, etc.)
- Adds coverage section (htmlcov/, branch coverage, no enforced threshold)
- Fixes CI accuracy: PR CI runs 'pytest test' not 'pytest' — docs/examples
  are not collected in CI
- Fixes '# pytest:' comment description — takes marker names, not -m
  expressions
- Removes predicate table and auto-detection table (defer to MARKERS_GUIDE
  to avoid maintaining two identical tables); keeps the usage code example
- Clarifies qualitative tests run locally by default, skipped in CI only

CONTRIBUTING.md: trims the Testing Quick Reference to essential setup
commands + pointer to strategy doc; fixes CI command (test was missing
the 'test' path argument); removes duplicate tier/marker/workflow content.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Comprehensive revision of testing-strategy.md following a cold-read review:

Accuracy fixes:
- Remove fabricated fixture table (session/mock_backend/clean_metrics_env/
  plugins are per-file, not global conftest; only gh_run and
  system_capabilities are confirmed global fixtures)
- Fix coverage claim: reports run in CI but are discarded — issue generative-computing#737
  tracks surfacing them; CI job summary is JUnit counts only, not coverage
- Fix 'open htmlcov/' to show both macOS and Linux commands

Clarity fixes:
- Rename test_<unit>_... naming convention to test_<subject>_... (unit
  collided with the tier name)
- Integration positive indicators were exclusively OTel; added component-
  wiring example so contributors outside telemetry can recognise their tests
- Qualitative code example no longer repeats the e2e test from the
  previous section
- Philosophy: clarify all tiers run locally by default, not just qualitative
- Add note about generative-computing#729 e2e→integration migration for existing tests

Structural (progressive):
- Move 'About backends' inside Classification where it's needed
- Move 'Local dev workflow' before CI pipeline (contributors need commands
  before they need to understand automation)
- Move Coverage after CI (it is CI output)
- Reorder so readers can stop earlier: Philosophy → Classification →
  Authoring → Local dev → CI → Coverage → Advanced

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Final review fixes:
- Fix backend method names in mock discipline: 'generate'/'astream' do
  not exist on backends; actual public methods are 'generate_from_chat_context'
  and 'generate_from_raw'
- Fix qualitative code example: 'result' was undefined (left over from
  trimming the duplicated e2e test). Now shows a single self-contained
  qualitative example with an explanatory comment about why the assertion
  is non-deterministic
- LiteLLM described as 'unified Python client' rather than 'proxy' (the
  proxy form exists but is not how Mellea uses LiteLLM)
- Reword 'SDK boundary assertions' (jargon) to 'assertions on what an
  external SDK actually emitted' in the per-tier table
- Remove redundant 'Auto-applied unit marker' subsection — the same fact
  is stated in Philosophy, Classification > Unit, and Authoring > Markers

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Reduce em-dash count from 38 to 24 by substituting commas, colons,
semicolons, full stops, and parentheses where they read as naturally or
better. Em dashes remain where they're genuinely the right call (sharp
asides, table-cell separators).

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 27, 2026
planetf1 added 4 commits May 27, 2026 08:46
The CI table previously marked nightly and pre-release as 'planned'.
Nightlies actually run today on the Bluevela LSF cluster, orchestrated
by an external nightly.py driver that invokes
test/scripts/run_tests_with_ollama_and_vllm.sh --group-by-backend with
Ollama and vLLM. Failures auto-file an issue (see generative-computing#985 for the format).

Updates:
- CI table now distinguishes Pre-commit / PR CI (GitHub Actions) /
  Nightly (Bluevela, scheduled, exists) / On-demand nightly (generative-computing#734,
  planned) / Pre-release (planned) / Local dev
- Adds 'Nightly in detail' section explaining the script, models, and
  --group-by-backend memory-fragmentation rationale
- Adds 'Scoping a test run' subsection with a four-axis table (tier /
  backend / compound expression / path) and a runtime resource-gating
  note, so contributors can see how to slice runs without reading
  every later section
- Common command lines now includes the local nightly-style invocation

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The doc explained concepts well but a developer wanting to actually
do something had to synthesise from four different sections. Adds a
short action-oriented section near the top covering:

- First-time setup (pointer to contributing-guide + the two essential
  Ollama commands)
- Running tests during development (three commands)
- Adding a new test (5-step procedure with links into Classification,
  Markers, and the rest of the doc for detail)
- Before opening a PR (the two commands CI runs, plus a pointer that
  GPU-backed paths only run in nightly so changes there should be
  validated via maintainer before merge)

Progressive structure preserved: this is a quick-reference top section,
the deeper concept sections below it remain authoritative.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Three follow-ups uncovered while validating the doc:

Stale issue references:
- generative-computing#734 (on-demand nightlies) and generative-computing#737 (coverage reporting) last saw
  activity 2026-03-24, over two months ago. Saying 'tracked in generative-computing#734'
  implies active work. Reword to 'open as #N but not currently being
  worked on' so contributors aren't misled into expecting progress.
- Pre-release tier had no specific tracker; just point at parent epic
  generative-computing#726.

Cross-context links:
- 'https://generative-computing.github.io/mellea/community/testing-strategy'
  was used in test/MARKERS_GUIDE.md, test/README.md, and CONTRIBUTING.md
  but the URL returns 404 — the canonical Mintlify domain isn't actually
  configured there. Replace with relative path to the source markdown,
  matching the pattern already used in AGENTS.md. Source-tree files
  read on GitHub get a working link in PR review and after merge.
- CONTRIBUTING.md anchor '#ci-pipeline-tiers' was stale (renamed
  to 'CI pipeline' / '#ci-pipeline').

Bluevela rename:
- Bluevela is an internal IBM environment, not mentioned anywhere else
  in the public repo. Replace with neutral 'IBM internal LSF cluster'
  so the published docs don't expose internal infra naming.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
The audit-markers skill listed test/MARKERS_GUIDE.md as the
authoritative source for marker conventions. Now that the test strategy
doc owns classification decision rules and per-tier definitions, add
it to the skill's Project References (above MARKERS_GUIDE.md) so the
agent reads it for the 'is this unit / integration / e2e / qualitative?'
question, then drops to MARKERS_GUIDE.md for marker mechanics.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@planetf1 planetf1 marked this pull request as ready for review May 27, 2026 08:01
@planetf1 planetf1 requested a review from a team as a code owner May 27, 2026 08:01
@planetf1 planetf1 requested review from jakelorocco and nrfulton May 27, 2026 08:01
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes look good / the documentation is accurate.

I don't quite know how I feel about this document being something that is hosted with the Mellea documentation but will default to others on that decision. Typically when I contribute to a repository, I expect the documentation for testing, etc... to be a regular *.md file with instructions, not a community doc.

Maybe we should start linking to some of these pages from our PR template? That way users don't have to go out of their way to figure out how to get their PRs in good shape?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the testing section from https://docs.mellea.ai/community/contributing-guide#testing as well then? This info seems to duplicate most of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: test strategy document — classification, authoring guide, and CI pipeline map

2 participants