Skip to content

fix(channels/discord): surface image attachments to text-only providers#1143

Open
benhoverter wants to merge 5 commits intoRightNow-AI:mainfrom
benhoverter:discord-file-sharing
Open

fix(channels/discord): surface image attachments to text-only providers#1143
benhoverter wants to merge 5 commits intoRightNow-AI:mainfrom
benhoverter:discord-file-sharing

Conversation

@benhoverter
Copy link
Copy Markdown
Contributor

@benhoverter benhoverter commented Apr 30, 2026

Summary

Fixes #1142.

Discord image attachments were silently dropped before reaching the model on text-only providers. Captioned images had attachments discarded (caption-wins path); bare-image messages were dropped entirely (early content.is_empty() return). Result: hallucinated acknowledgements of unseen images, or no response at all.

This PR rewires the inbound attachment path end-to-end so attachments survive parsing, dispatch coherently across vision-capable and text-only providers, and render as a clear marker on text-only providers instead of being silently elided.

Changes

  • openfang-channels types: new ChannelContent::Multipart(Vec<ChannelContent>) variant for caption + attachment(s) as sibling blocks. Nesting forbidden (doc + debug_assert).
  • Discord parser: rewritten to handle bare-image and captioned-image shapes correctly. MIME classification with filename-extension fallback for payloads missing content_type. 5 MB vision-size cap matching Anthropic's image-block limit; over-cap images classify as File. Emits Multipart whenever text + attachments coexist or multiple attachments are present.
  • Bridge: flat-maps Multipart in both dispatch paths — Vec<ContentBlock> for multimodal-capable providers, newline-joined text descriptor for text-flatten providers.
  • Telegram channel: exhaustive-match parity for the new variant; defensive outbound flatten.
  • claude_code driver: renders Image blocks as [attachment: <mime> image, ~N KB — not viewable on this provider] instead of dropping them. The model still cannot see the image, but it can acknowledge it coherently rather than confabulating.

Out of scope (follow-ups): persisting attachments to disk; vision-provider dispatch refinements beyond exposing existing image bytes; non-Discord inbound attachment classification; handling for files larger than the 5 MB vision cap.

Testing

  • cargo clippy --workspace --all-targets -- -D warnings passes
  • cargo test --workspace passes
  • Live integration tested (if applicable)

Live smoke test on a local-main daemon build, both shapes:

  • Captioned image ("Deployed..." + PNG) → daemon log shows Multipart([Text, Image{...}]); model prompt contains caption + [attachment: image/png image, ~3172 KB — not viewable on this provider]; LLM acknowledgement coherent.
  • Bare image (PNG, no body) → daemon log shows image dispatch; model prompt contains [attachment: image/png image, ~2774 KB — not viewable on this provider]; LLM acknowledgement coherent.

Added 9 parser unit tests covering all (text-empty, n-attachments) shapes plus MIME edge cases (HEIC, oversize, missing content_type), and 2 driver unit tests covering captioned and bare-image marker rendering.

CI note: a pre-existing cargo fmt --all -- --check failure on main (introduced in da6b567a and earlier; tracked by #1121) is inherited by this branch. None of the failing fmt lines are in files this PR modifies. Happy to address as a separate fmt-only PR if maintainers prefer.

Security

  • No new unsafe code
  • No secrets or API keys in diff
  • User input validated at boundaries
  • No personal agent names in diff, description, tests, or fixtures

@benhoverter
Copy link
Copy Markdown
Contributor Author

On the Security Audit failure: this PR's Cargo.lock is byte-identical to RightNow-AI/openfang's main.

git diff --stat rightnow-ai/main...discord-file-sharing -- Cargo.lock        # empty
git diff --stat rightnow-ai/main...discord-file-sharing -- '**/Cargo.toml'   # empty

Zero deps added, removed, or bumped. The same advisories fire on main's latest CI run (25124418579, head 15ed29c = bump v0.6.2), so the failure is pre-existing on main rather than introduced here.

Pick 3a of the Discord file-passing plan: extend the URL-flavored File
variant with optional mime and size metadata so adapters can pass
attachment context through to bridges. FileData (bytes-flavored) is
unchanged; size is implicit in data.len() and mime_type already exists.

Match-arm sites in bridge.rs, telegram.rs, whatsapp.rs use `..` to stay
forward-compatible. Construction sites in telegram.rs and kernel.rs
pass `mime: None, size: None` for now; Discord inbound (PR-A) will
populate them.

Refs: projects/openfang-fork/discord-file-passing-plan.md
Pick 3a-bis of the Discord file-passing plan: teach the multimodal
image fetcher to handle file:// URLs by reading from local disk
instead of going through reqwest. PR-A (Discord inbound) will
materialize attachments to a shared inbox dir and emit
ChannelContent::Image { url: "file://..." }, so this branch is what
unblocks vision on inbox-materialized images after the Discord CDN
URL has expired.

Implementation:
- Branch on url.strip_prefix("file://"); local read uses tokio::fs::read.
- HTTP path unchanged. Both paths converge on (Vec<u8>, Option<String>)
  before the existing 5MB cap, magic-byte sniffing, and base64 path.
- No content-type header on file:// — magic-byte detection and URL
  extension fallback do all the media-type work, which is fine since
  detect_image_magic and media_type_from_url already exist.
- No new deps. Vec<u8> instead of bytes::Bytes to avoid pulling in
  the bytes crate as a direct dep.
- No URL percent-decoding: the inbox writer (PR-A) controls filenames
  and avoids characters that would need encoding.

Refs: projects/openfang-fork/discord-file-passing-plan.md (step 2)
Adds a single tracing::debug! at the top of parse_discord_message that
dumps the full payload JSON. Silent at default `info` level; enable with
`RUST_LOG=openfang_channels::discord=debug` to capture real attachment
JSON when developing the file-passing parse code.

Logs before any filters (bot, allowed_users, allowed_guilds, empty
content) so attachment-only messages are visible too.
Discord MESSAGE_CREATE payloads with attachments were previously parsed
in a way that either dropped the attachment (when text was present, only
the text was kept) or dropped the whole message (when text was empty,
the early `content.is_empty()` return killed bare-image posts). The
result on text-only providers like claude-code: silent drops, then
hallucinated acknowledgements of content the model never saw.

This rewires the inbound path end-to-end:

* types: add ChannelContent::Multipart(Vec<ChannelContent>) so a single
  inbound message can carry a caption + one or more attachments as
  sibling blocks. Doc forbids nesting; consumers debug_assert.

* discord: classify attachments by MIME (with extension fallback for
  bot-relayed payloads that omit content_type) and a 5 MB vision-size
  cap matching Anthropic's image block limit. Vision-eligible images
  become ChannelContent::Image; everything else becomes File. Emit
  Multipart whenever text and attachments coexist, or when there are
  multiple attachments.

* bridge: flat-map Multipart in both dispatch paths — into Vec<ContentBlock>
  for multimodal-capable providers, and into a newline-joined text
  descriptor for text-flatten providers.

* telegram: add the Multipart arm to send_to_user for exhaustive-match
  parity; flattens defensively.

* claude_code driver: render Image blocks as
  "[attachment: <mime> image, ~N KB — not viewable on this provider]"
  instead of dropping them. The model still cannot see the image, but
  it can acknowledge it coherently rather than confabulating.

Adds 9 discord parser tests covering all (text, attachment-count) shapes
plus MIME edge cases, and 2 claude_code driver tests covering captioned
and bare-image rendering.
Discord's CDN edges occasionally advertise `content-encoding: gzip` (or
deflate/brotli) on PNG/JPEG passthroughs while the body is raw,
uncompressed image bytes. With the default `reqwest::Client::new()` and
the workspace's gzip/deflate/brotli features all enabled, reqwest's
transparent-decompression layer chokes on the PNG/JPEG header and
returns "error decoding response body" only on `bytes().await` (not on
`send()`), causing `download_image_to_blocks` to silently fall back to a
text-only block — the user's image never reaches the model.

Build the client explicitly with no_gzip/no_deflate/no_brotli so the
request advertises identity encoding and the body is read raw. Also set
a User-Agent (some CDN edges 403 clients without one) and a 30s timeout
aligned with the upstream 5 MB cap.

Repro: send an image attachment via Discord; the daemon logs
`Failed to read image bytes: error decoding response body` and the turn
appends as text-only with `appended_has_image=false`. After this fix the
PNG bytes are read and emitted as an Image content block as intended.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discord image attachments are silently dropped on text-only model providers

1 participant