test(langchain): add file-input integration tests across providers#85
Merged
Conversation
Adds a parameterized matrix that exercises every provider's chat model against the 11 most common file formats (txt, md, csv, html, pdf, docx, xlsx, png, jpg, gif, webp) via structured output. Each fixture embeds a known invoice payload so the assertion verifies the model actually read the file rather than hallucinating. Refactors the per-provider integration suites onto a shared `UiPathChatModelIntegrationTests` base that consolidates the setup fixture, `chat_model_class`/`params`, `supports_*` defaults, the `test_stream`/`test_astream` overrides, and the parallel-tool-calling matrix. Providers only carry their `skip_on_specific_configs` rules and override `_bind_parallel_and_sequential` when they need the Anthropic dialect. Fixture files are generated once and committed under `tests/langchain/fixtures/files/`; `tests/langchain/file_fixtures.py` keeps the generators (run `python -m tests.langchain.file_fixtures` to refresh). Tests load the committed bytes via `load_fixture(fmt)`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bedrock-hosted Claude and a few Gemini configs reliably OCR the invoice text from images but drop the decimal point: "$1234.56" comes back as `total_amount=123456.0`. That made 29 of the new file-input tests deterministically fail on Bedrock images and flaky on Gemini thinking runs. Switching the fixture amount to a whole-dollar value ($4200) removes the ambiguity entirely. Pydantic still coerces to `float`, so the schema field and assertion are unchanged. Regenerated all 11 committed fixture files and re-recorded the affected cassettes. Full file-input run: 364 passed, 318 skipped, 0 failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
34b4bc4 to
34b8adc
Compare
ionut-mihalache-uipath
approved these changes
May 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
model.with_structured_output(InvoiceInfo)and asserts the model extracts the embedded invoice payload (INV-7421, customerAcme Corp, total1234.56) — so a passing assertion means the model actually read the file, not that it returned plausible JSON.ChatModelIntegrationTestssubclass into a newUiPathChatModelIntegrationTestsbase (tests/langchain/integration_tests.py):setup_models,chat_model_class/params,supports_*defaults, thetest_stream/test_astreamchunk-position overrides, the parallel-tool-calling matrix (with a_bind_parallel_and_sequentialhook), and the file-input tests. Provider classes shrink to theirskip_on_specific_configsrules plus an override for the Anthropic-dialect parallel-tool bind where applicable. Net: -684 / +894 lines, mostly because the new file-input methods are net-new.tests/langchain/fixtures/files/. The generators live intests/langchain/file_fixtures.py; runpython -m tests.langchain.file_fixturesto refresh them.python-docx,openpyxl,reportlab, andPilloware added as dev-only deps so the regeneration script works.Scope notes
.doc/.xlsare intentionally not covered — no pure-Python writer exists and most LLMs can't read them. We test the modern.docx/.xlsxformats (extracted to text client-side before sending), and.pdfplus images go through standard LangChainfile/imagecontent blocks.Test plan
ruff check && ruff format --checkpyright(clean, 0 errors)pytest tests/core→ 455 passed, 35 skippedpytest tests/langchain/clients/*/test_unit.py tests/langchain/features tests/langchain/smoke_test.py→ 197 passed, 24 skippedpytest tests/langchain/clients/normalized/test_integration.py::TestNormalizedIntegrationChatModel::test_file_inputs→ 52 passed, 14 skipped (cassettes recorded locally)test_stream/test_astream/test_parallel_and_sequential_tool_callingagainst existing cassettes for openai, anthropic, bedrock, litellm, google, vertexai — all green🤖 Generated with Claude Code