test(langchain): add file-input integration tests across providers by cosminacho · Pull Request #85 · UiPath/uipath-llm-client-python

cosminacho · 2026-05-20T14:38:22Z

Summary

Adds a parameterized file-input matrix to every provider's LangChain integration suite. Each provider/model config is now exercised against 11 common file formats (txt, md, csv, html, pdf, docx, xlsx, png, jpg, gif, webp), with sync + async variants. The test sends the file through model.with_structured_output(InvoiceInfo) and asserts the model extracts the embedded invoice payload (INV-7421, customer Acme Corp, total 1234.56) — so a passing assertion means the model actually read the file, not that it returned plausible JSON.
Extracts the duplicated bits from every per-provider ChatModelIntegrationTests subclass into a new UiPathChatModelIntegrationTests base (tests/langchain/integration_tests.py): setup_models, chat_model_class/params, supports_* defaults, the test_stream/test_astream chunk-position overrides, the parallel-tool-calling matrix (with a _bind_parallel_and_sequential hook), and the file-input tests. Provider classes shrink to their skip_on_specific_configs rules plus an override for the Anthropic-dialect parallel-tool bind where applicable. Net: -684 / +894 lines, mostly because the new file-input methods are net-new.
Fixture files are generated once and committed under tests/langchain/fixtures/files/. The generators live in tests/langchain/file_fixtures.py; run python -m tests.langchain.file_fixtures to refresh them. python-docx, openpyxl, reportlab, and Pillow are added as dev-only deps so the regeneration script works.

Scope notes

Legacy .doc / .xls are intentionally not covered — no pure-Python writer exists and most LLMs can't read them. We test the modern .docx / .xlsx formats (extracted to text client-side before sending), and .pdf plus images go through standard LangChain file / image content blocks.
Existing per-provider skips are preserved verbatim; the new file-input tests inherit them and add format-aware skips where a provider can't handle a particular block type (e.g. PDF on GPT chat-completions, image/PDF for Claude via the normalized API, structured output + Anthropic thinking, etc.).

Test plan

ruff check && ruff format --check
pyright (clean, 0 errors)
pytest tests/core → 455 passed, 35 skipped
pytest tests/langchain/clients/*/test_unit.py tests/langchain/features tests/langchain/smoke_test.py → 197 passed, 24 skipped
pytest tests/langchain/clients/normalized/test_integration.py::TestNormalizedIntegrationChatModel::test_file_inputs → 52 passed, 14 skipped (cassettes recorded locally)
Spot-checked refactored test_stream / test_astream / test_parallel_and_sequential_tool_calling against existing cassettes for openai, anthropic, bedrock, litellm, google, vertexai — all green
Maintainer to record the remaining cassettes for the new file-input tests across the other providers before merging

🤖 Generated with Claude Code

Adds a parameterized matrix that exercises every provider's chat model against the 11 most common file formats (txt, md, csv, html, pdf, docx, xlsx, png, jpg, gif, webp) via structured output. Each fixture embeds a known invoice payload so the assertion verifies the model actually read the file rather than hallucinating. Refactors the per-provider integration suites onto a shared `UiPathChatModelIntegrationTests` base that consolidates the setup fixture, `chat_model_class`/`params`, `supports_*` defaults, the `test_stream`/`test_astream` overrides, and the parallel-tool-calling matrix. Providers only carry their `skip_on_specific_configs` rules and override `_bind_parallel_and_sequential` when they need the Anthropic dialect. Fixture files are generated once and committed under `tests/langchain/fixtures/files/`; `tests/langchain/file_fixtures.py` keeps the generators (run `python -m tests.langchain.file_fixtures` to refresh). Tests load the committed bytes via `load_fixture(fmt)`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bedrock-hosted Claude and a few Gemini configs reliably OCR the invoice text from images but drop the decimal point: "$1234.56" comes back as `total_amount=123456.0`. That made 29 of the new file-input tests deterministically fail on Bedrock images and flaky on Gemini thinking runs. Switching the fixture amount to a whole-dollar value ($4200) removes the ambiguity entirely. Pydantic still coerces to `float`, so the schema field and assertion are unchanged. Regenerated all 11 committed fixture files and re-recorded the affected cassettes. Full file-input run: 364 passed, 318 skipped, 0 failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cosminacho requested review from DragosBobolea, cristipufu, dragosvelcea, ionmincu, ionut-mihalache-uipath and radu-mocanu as code owners May 20, 2026 14:38

cosminacho had a problem deploying to LLMGW_SETTINGS May 20, 2026 14:39 — with GitHub Actions Failure

cosminacho temporarily deployed to LLMGW_SETTINGS May 20, 2026 15:41 — with GitHub Actions Inactive

cosminacho and others added 2 commits May 20, 2026 19:21

cosminacho force-pushed the test/langchain-file-input-integration branch from 34b4bc4 to 34b8adc Compare May 20, 2026 16:21

cosminacho temporarily deployed to LLMGW_SETTINGS May 20, 2026 16:22 — with GitHub Actions Inactive

ionut-mihalache-uipath approved these changes May 20, 2026

View reviewed changes

cosminacho merged commit 202136e into main May 21, 2026
11 checks passed

cosminacho deleted the test/langchain-file-input-integration branch May 21, 2026 05:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(langchain): add file-input integration tests across providers#85

test(langchain): add file-input integration tests across providers#85
cosminacho merged 2 commits into
mainfrom
test/langchain-file-input-integration

cosminacho commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cosminacho commented May 20, 2026

Summary

Scope notes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants