fix(channels/discord): surface image attachments to text-only providers#1143
Open
benhoverter wants to merge 5 commits intoRightNow-AI:mainfrom
Open
fix(channels/discord): surface image attachments to text-only providers#1143benhoverter wants to merge 5 commits intoRightNow-AI:mainfrom
benhoverter wants to merge 5 commits intoRightNow-AI:mainfrom
Conversation
Contributor
Author
|
On the Security Audit failure: this PR's Zero deps added, removed, or bumped. The same advisories fire on |
Pick 3a of the Discord file-passing plan: extend the URL-flavored File variant with optional mime and size metadata so adapters can pass attachment context through to bridges. FileData (bytes-flavored) is unchanged; size is implicit in data.len() and mime_type already exists. Match-arm sites in bridge.rs, telegram.rs, whatsapp.rs use `..` to stay forward-compatible. Construction sites in telegram.rs and kernel.rs pass `mime: None, size: None` for now; Discord inbound (PR-A) will populate them. Refs: projects/openfang-fork/discord-file-passing-plan.md
Pick 3a-bis of the Discord file-passing plan: teach the multimodal
image fetcher to handle file:// URLs by reading from local disk
instead of going through reqwest. PR-A (Discord inbound) will
materialize attachments to a shared inbox dir and emit
ChannelContent::Image { url: "file://..." }, so this branch is what
unblocks vision on inbox-materialized images after the Discord CDN
URL has expired.
Implementation:
- Branch on url.strip_prefix("file://"); local read uses tokio::fs::read.
- HTTP path unchanged. Both paths converge on (Vec<u8>, Option<String>)
before the existing 5MB cap, magic-byte sniffing, and base64 path.
- No content-type header on file:// — magic-byte detection and URL
extension fallback do all the media-type work, which is fine since
detect_image_magic and media_type_from_url already exist.
- No new deps. Vec<u8> instead of bytes::Bytes to avoid pulling in
the bytes crate as a direct dep.
- No URL percent-decoding: the inbox writer (PR-A) controls filenames
and avoids characters that would need encoding.
Refs: projects/openfang-fork/discord-file-passing-plan.md (step 2)
Adds a single tracing::debug! at the top of parse_discord_message that dumps the full payload JSON. Silent at default `info` level; enable with `RUST_LOG=openfang_channels::discord=debug` to capture real attachment JSON when developing the file-passing parse code. Logs before any filters (bot, allowed_users, allowed_guilds, empty content) so attachment-only messages are visible too.
Discord MESSAGE_CREATE payloads with attachments were previously parsed in a way that either dropped the attachment (when text was present, only the text was kept) or dropped the whole message (when text was empty, the early `content.is_empty()` return killed bare-image posts). The result on text-only providers like claude-code: silent drops, then hallucinated acknowledgements of content the model never saw. This rewires the inbound path end-to-end: * types: add ChannelContent::Multipart(Vec<ChannelContent>) so a single inbound message can carry a caption + one or more attachments as sibling blocks. Doc forbids nesting; consumers debug_assert. * discord: classify attachments by MIME (with extension fallback for bot-relayed payloads that omit content_type) and a 5 MB vision-size cap matching Anthropic's image block limit. Vision-eligible images become ChannelContent::Image; everything else becomes File. Emit Multipart whenever text and attachments coexist, or when there are multiple attachments. * bridge: flat-map Multipart in both dispatch paths — into Vec<ContentBlock> for multimodal-capable providers, and into a newline-joined text descriptor for text-flatten providers. * telegram: add the Multipart arm to send_to_user for exhaustive-match parity; flattens defensively. * claude_code driver: render Image blocks as "[attachment: <mime> image, ~N KB — not viewable on this provider]" instead of dropping them. The model still cannot see the image, but it can acknowledge it coherently rather than confabulating. Adds 9 discord parser tests covering all (text, attachment-count) shapes plus MIME edge cases, and 2 claude_code driver tests covering captioned and bare-image rendering.
b6cccd8 to
118eace
Compare
4 tasks
Discord's CDN edges occasionally advertise `content-encoding: gzip` (or deflate/brotli) on PNG/JPEG passthroughs while the body is raw, uncompressed image bytes. With the default `reqwest::Client::new()` and the workspace's gzip/deflate/brotli features all enabled, reqwest's transparent-decompression layer chokes on the PNG/JPEG header and returns "error decoding response body" only on `bytes().await` (not on `send()`), causing `download_image_to_blocks` to silently fall back to a text-only block — the user's image never reaches the model. Build the client explicitly with no_gzip/no_deflate/no_brotli so the request advertises identity encoding and the body is read raw. Also set a User-Agent (some CDN edges 403 clients without one) and a 30s timeout aligned with the upstream 5 MB cap. Repro: send an image attachment via Discord; the daemon logs `Failed to read image bytes: error decoding response body` and the turn appends as text-only with `appended_has_image=false`. After this fix the PNG bytes are read and emitted as an Image content block as intended.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1142.
Discord image attachments were silently dropped before reaching the model on text-only providers. Captioned images had attachments discarded (caption-wins path); bare-image messages were dropped entirely (early
content.is_empty()return). Result: hallucinated acknowledgements of unseen images, or no response at all.This PR rewires the inbound attachment path end-to-end so attachments survive parsing, dispatch coherently across vision-capable and text-only providers, and render as a clear marker on text-only providers instead of being silently elided.
Changes
openfang-channelstypes: newChannelContent::Multipart(Vec<ChannelContent>)variant for caption + attachment(s) as sibling blocks. Nesting forbidden (doc +debug_assert).content_type. 5 MB vision-size cap matching Anthropic's image-block limit; over-cap images classify asFile. EmitsMultipartwhenever text + attachments coexist or multiple attachments are present.Multipartin both dispatch paths —Vec<ContentBlock>for multimodal-capable providers, newline-joined text descriptor for text-flatten providers.claude_codedriver: rendersImageblocks as[attachment: <mime> image, ~N KB — not viewable on this provider]instead of dropping them. The model still cannot see the image, but it can acknowledge it coherently rather than confabulating.Out of scope (follow-ups): persisting attachments to disk; vision-provider dispatch refinements beyond exposing existing image bytes; non-Discord inbound attachment classification; handling for files larger than the 5 MB vision cap.
Testing
cargo clippy --workspace --all-targets -- -D warningspassescargo test --workspacepassesLive smoke test on a
local-maindaemon build, both shapes:"Deployed..."+ PNG) → daemon log showsMultipart([Text, Image{...}]); model prompt contains caption +[attachment: image/png image, ~3172 KB — not viewable on this provider]; LLM acknowledgement coherent.[attachment: image/png image, ~2774 KB — not viewable on this provider]; LLM acknowledgement coherent.Added 9 parser unit tests covering all
(text-empty, n-attachments)shapes plus MIME edge cases (HEIC, oversize, missingcontent_type), and 2 driver unit tests covering captioned and bare-image marker rendering.CI note: a pre-existing
cargo fmt --all -- --checkfailure onmain(introduced inda6b567aand earlier; tracked by #1121) is inherited by this branch. None of the failing fmt lines are in files this PR modifies. Happy to address as a separate fmt-only PR if maintainers prefer.Security