fix(channels/discord): surface image attachments to text-only providers by benhoverter · Pull Request #1143 · RightNow-AI/openfang

benhoverter · 2026-04-30T22:48:52Z

Summary

Discord image attachments were silently dropped before reaching the model on text-only providers. Captioned images had attachments discarded (caption-wins path); bare-image messages were dropped entirely (early content.is_empty() return). Result: hallucinated acknowledgements of unseen images, or no response at all.

This PR rewires the inbound attachment path end-to-end so attachments survive parsing, dispatch coherently across vision-capable and text-only providers, and render as a clear marker on text-only providers instead of being silently elided.

Changes

openfang-channels types: new ChannelContent::Multipart(Vec<ChannelContent>) variant for caption + attachment(s) as sibling blocks. Nesting forbidden (doc + debug_assert).
Discord parser: rewritten to handle bare-image and captioned-image shapes correctly. MIME classification with filename-extension fallback for payloads missing content_type. 5 MB vision-size cap matching Anthropic's image-block limit; over-cap images classify as File. Emits Multipart whenever text + attachments coexist or multiple attachments are present.
Bridge: flat-maps Multipart in both dispatch paths — Vec<ContentBlock> for multimodal-capable providers, newline-joined text descriptor for text-flatten providers.
Telegram channel: exhaustive-match parity for the new variant; defensive outbound flatten.
claude_code driver: renders Image blocks as [attachment: <mime> image, ~N KB — not viewable on this provider] instead of dropping them. The model still cannot see the image, but it can acknowledge it coherently rather than confabulating.

Out of scope (follow-ups): persisting attachments to disk; vision-provider dispatch refinements beyond exposing existing image bytes; non-Discord inbound attachment classification; handling for files larger than the 5 MB vision cap.

Testing

cargo clippy --workspace --all-targets -- -D warnings passes
cargo test --workspace passes
Live integration tested (if applicable)

Live smoke test on a local-main daemon build, both shapes:

Captioned image ("Deployed..." + PNG) → daemon log shows Multipart([Text, Image{...}]); model prompt contains caption + [attachment: image/png image, ~3172 KB — not viewable on this provider]; LLM acknowledgement coherent.
Bare image (PNG, no body) → daemon log shows image dispatch; model prompt contains [attachment: image/png image, ~2774 KB — not viewable on this provider]; LLM acknowledgement coherent.

Added 9 parser unit tests covering all (text-empty, n-attachments) shapes plus MIME edge cases (HEIC, oversize, missing content_type), and 2 driver unit tests covering captioned and bare-image marker rendering.

CI note: a pre-existing cargo fmt --all -- --check failure on main (introduced in da6b567a and earlier; tracked by #1121) is inherited by this branch. None of the failing fmt lines are in files this PR modifies. Happy to address as a separate fmt-only PR if maintainers prefer.

Security

No new unsafe code
No secrets or API keys in diff
User input validated at boundaries
No personal agent names in diff, description, tests, or fixtures

benhoverter · 2026-04-30T23:06:05Z

On the Security Audit failure: this PR's Cargo.lock is byte-identical to RightNow-AI/openfang's main.

git diff --stat rightnow-ai/main...discord-file-sharing -- Cargo.lock        # empty
git diff --stat rightnow-ai/main...discord-file-sharing -- '**/Cargo.toml'   # empty

Zero deps added, removed, or bumped. The same advisories fire on main's latest CI run (25124418579, head 15ed29c = bump v0.6.2), so the failure is pre-existing on main rather than introduced here.

Pick 3a of the Discord file-passing plan: extend the URL-flavored File variant with optional mime and size metadata so adapters can pass attachment context through to bridges. FileData (bytes-flavored) is unchanged; size is implicit in data.len() and mime_type already exists. Match-arm sites in bridge.rs, telegram.rs, whatsapp.rs use `..` to stay forward-compatible. Construction sites in telegram.rs and kernel.rs pass `mime: None, size: None` for now; Discord inbound (PR-A) will populate them. Refs: projects/openfang-fork/discord-file-passing-plan.md

Pick 3a-bis of the Discord file-passing plan: teach the multimodal image fetcher to handle file:// URLs by reading from local disk instead of going through reqwest. PR-A (Discord inbound) will materialize attachments to a shared inbox dir and emit ChannelContent::Image { url: "file://..." }, so this branch is what unblocks vision on inbox-materialized images after the Discord CDN URL has expired. Implementation: - Branch on url.strip_prefix("file://"); local read uses tokio::fs::read. - HTTP path unchanged. Both paths converge on (Vec<u8>, Option<String>) before the existing 5MB cap, magic-byte sniffing, and base64 path. - No content-type header on file:// — magic-byte detection and URL extension fallback do all the media-type work, which is fine since detect_image_magic and media_type_from_url already exist. - No new deps. Vec<u8> instead of bytes::Bytes to avoid pulling in the bytes crate as a direct dep. - No URL percent-decoding: the inbox writer (PR-A) controls filenames and avoids characters that would need encoding. Refs: projects/openfang-fork/discord-file-passing-plan.md (step 2)

Adds a single tracing::debug! at the top of parse_discord_message that dumps the full payload JSON. Silent at default `info` level; enable with `RUST_LOG=openfang_channels::discord=debug` to capture real attachment JSON when developing the file-passing parse code. Logs before any filters (bot, allowed_users, allowed_guilds, empty content) so attachment-only messages are visible too.

Discord MESSAGE_CREATE payloads with attachments were previously parsed in a way that either dropped the attachment (when text was present, only the text was kept) or dropped the whole message (when text was empty, the early `content.is_empty()` return killed bare-image posts). The result on text-only providers like claude-code: silent drops, then hallucinated acknowledgements of content the model never saw. This rewires the inbound path end-to-end: * types: add ChannelContent::Multipart(Vec<ChannelContent>) so a single inbound message can carry a caption + one or more attachments as sibling blocks. Doc forbids nesting; consumers debug_assert. * discord: classify attachments by MIME (with extension fallback for bot-relayed payloads that omit content_type) and a 5 MB vision-size cap matching Anthropic's image block limit. Vision-eligible images become ChannelContent::Image; everything else becomes File. Emit Multipart whenever text and attachments coexist, or when there are multiple attachments. * bridge: flat-map Multipart in both dispatch paths — into Vec<ContentBlock> for multimodal-capable providers, and into a newline-joined text descriptor for text-flatten providers. * telegram: add the Multipart arm to send_to_user for exhaustive-match parity; flattens defensively. * claude_code driver: render Image blocks as "[attachment: <mime> image, ~N KB — not viewable on this provider]" instead of dropping them. The model still cannot see the image, but it can acknowledge it coherently rather than confabulating. Adds 9 discord parser tests covering all (text, attachment-count) shapes plus MIME edge cases, and 2 claude_code driver tests covering captioned and bare-image rendering.

Discord's CDN edges occasionally advertise `content-encoding: gzip` (or deflate/brotli) on PNG/JPEG passthroughs while the body is raw, uncompressed image bytes. With the default `reqwest::Client::new()` and the workspace's gzip/deflate/brotli features all enabled, reqwest's transparent-decompression layer chokes on the PNG/JPEG header and returns "error decoding response body" only on `bytes().await` (not on `send()`), causing `download_image_to_blocks` to silently fall back to a text-only block — the user's image never reaches the model. Build the client explicitly with no_gzip/no_deflate/no_brotli so the request advertises identity encoding and the body is read raw. Also set a User-Agent (some CDN edges 403 clients without one) and a 30s timeout aligned with the upstream 5 MB cap. Repro: send an image attachment via Discord; the daemon logs `Failed to read image bytes: error decoding response body` and the turn appends as text-only with `appended_has_image=false`. After this fix the PNG bytes are read and emitted as an Image content block as intended.

benhoverter added 4 commits May 2, 2026 15:16

benhoverter force-pushed the discord-file-sharing branch from b6cccd8 to 118eace Compare May 2, 2026 22:27

benhoverter mentioned this pull request May 3, 2026

runtime/claude_code: materialize image blocks to tmpfile + extract image_cache module #1151

Open

4 tasks

benhoverter mentioned this pull request May 4, 2026

feat(channels/discord) Outbound file/image attachments + image_cache hardening #1162

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(channels/discord): surface image attachments to text-only providers#1143

fix(channels/discord): surface image attachments to text-only providers#1143
benhoverter wants to merge 5 commits intoRightNow-AI:mainfrom
benhoverter:discord-file-sharing

benhoverter commented Apr 30, 2026 •

edited

Loading

Uh oh!

benhoverter commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benhoverter commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Security

Uh oh!

benhoverter commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benhoverter commented Apr 30, 2026 •

edited

Loading