Skip to content

test(langchain): add file-input integration tests across providers#85

Merged
cosminacho merged 2 commits into
mainfrom
test/langchain-file-input-integration
May 21, 2026
Merged

test(langchain): add file-input integration tests across providers#85
cosminacho merged 2 commits into
mainfrom
test/langchain-file-input-integration

Conversation

@cosminacho
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a parameterized file-input matrix to every provider's LangChain integration suite. Each provider/model config is now exercised against 11 common file formats (txt, md, csv, html, pdf, docx, xlsx, png, jpg, gif, webp), with sync + async variants. The test sends the file through model.with_structured_output(InvoiceInfo) and asserts the model extracts the embedded invoice payload (INV-7421, customer Acme Corp, total 1234.56) — so a passing assertion means the model actually read the file, not that it returned plausible JSON.
  • Extracts the duplicated bits from every per-provider ChatModelIntegrationTests subclass into a new UiPathChatModelIntegrationTests base (tests/langchain/integration_tests.py): setup_models, chat_model_class/params, supports_* defaults, the test_stream/test_astream chunk-position overrides, the parallel-tool-calling matrix (with a _bind_parallel_and_sequential hook), and the file-input tests. Provider classes shrink to their skip_on_specific_configs rules plus an override for the Anthropic-dialect parallel-tool bind where applicable. Net: -684 / +894 lines, mostly because the new file-input methods are net-new.
  • Fixture files are generated once and committed under tests/langchain/fixtures/files/. The generators live in tests/langchain/file_fixtures.py; run python -m tests.langchain.file_fixtures to refresh them. python-docx, openpyxl, reportlab, and Pillow are added as dev-only deps so the regeneration script works.

Scope notes

  • Legacy .doc / .xls are intentionally not covered — no pure-Python writer exists and most LLMs can't read them. We test the modern .docx / .xlsx formats (extracted to text client-side before sending), and .pdf plus images go through standard LangChain file / image content blocks.
  • Existing per-provider skips are preserved verbatim; the new file-input tests inherit them and add format-aware skips where a provider can't handle a particular block type (e.g. PDF on GPT chat-completions, image/PDF for Claude via the normalized API, structured output + Anthropic thinking, etc.).

Test plan

  • ruff check && ruff format --check
  • pyright (clean, 0 errors)
  • pytest tests/core → 455 passed, 35 skipped
  • pytest tests/langchain/clients/*/test_unit.py tests/langchain/features tests/langchain/smoke_test.py → 197 passed, 24 skipped
  • pytest tests/langchain/clients/normalized/test_integration.py::TestNormalizedIntegrationChatModel::test_file_inputs → 52 passed, 14 skipped (cassettes recorded locally)
  • Spot-checked refactored test_stream / test_astream / test_parallel_and_sequential_tool_calling against existing cassettes for openai, anthropic, bedrock, litellm, google, vertexai — all green
  • Maintainer to record the remaining cassettes for the new file-input tests across the other providers before merging

🤖 Generated with Claude Code

cosminacho and others added 2 commits May 20, 2026 19:21
Adds a parameterized matrix that exercises every provider's chat model
against the 11 most common file formats (txt, md, csv, html, pdf, docx,
xlsx, png, jpg, gif, webp) via structured output. Each fixture embeds a
known invoice payload so the assertion verifies the model actually read
the file rather than hallucinating.

Refactors the per-provider integration suites onto a shared
`UiPathChatModelIntegrationTests` base that consolidates the setup
fixture, `chat_model_class`/`params`, `supports_*` defaults, the
`test_stream`/`test_astream` overrides, and the parallel-tool-calling
matrix. Providers only carry their `skip_on_specific_configs` rules and
override `_bind_parallel_and_sequential` when they need the Anthropic
dialect.

Fixture files are generated once and committed under
`tests/langchain/fixtures/files/`; `tests/langchain/file_fixtures.py`
keeps the generators (run `python -m tests.langchain.file_fixtures` to
refresh). Tests load the committed bytes via `load_fixture(fmt)`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bedrock-hosted Claude and a few Gemini configs reliably OCR the invoice
text from images but drop the decimal point: "$1234.56" comes back as
`total_amount=123456.0`. That made 29 of the new file-input tests
deterministically fail on Bedrock images and flaky on Gemini thinking
runs.

Switching the fixture amount to a whole-dollar value ($4200) removes the
ambiguity entirely. Pydantic still coerces to `float`, so the schema
field and assertion are unchanged. Regenerated all 11 committed fixture
files and re-recorded the affected cassettes.

Full file-input run: 364 passed, 318 skipped, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cosminacho cosminacho force-pushed the test/langchain-file-input-integration branch from 34b4bc4 to 34b8adc Compare May 20, 2026 16:21
@cosminacho cosminacho merged commit 202136e into main May 21, 2026
11 checks passed
@cosminacho cosminacho deleted the test/langchain-file-input-integration branch May 21, 2026 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants