Skip to content

fix: preserve output type in FunctionSpanData.export()#2871

Open
lawrence3699 wants to merge 1 commit intoopenai:mainfrom
lawrence3699:fix/function-span-export-output
Open

fix: preserve output type in FunctionSpanData.export()#2871
lawrence3699 wants to merge 1 commit intoopenai:mainfrom
lawrence3699:fix/function-span-export-output

Conversation

@lawrence3699
Copy link
Copy Markdown

Summary

FunctionSpanData.export() applies str(self.output) if self.output else None, which corrupts traced tool outputs in two ways:

  1. Dict/list outputs become Python repr strings. A tool returning {"status": "ok"} gets exported as the string "{'status': 'ok'}" — single quotes, Python-style booleans — instead of staying as a dict that json.dumps can serialize correctly. This means the tracing ingest API receives malformed data.

  2. Falsy but valid outputs are dropped. A tool returning 0, False, "", or [] gets exported as None, losing the actual result.

The sibling class GenerationSpanData.export() passes self.output through directly without conversion. This fix aligns FunctionSpanData with that pattern.

Test plan

  • Added tests/tracing/test_span_data.py with 10 unit tests covering dict, string, None, falsy (0, False, empty string, empty list), list, and numeric outputs.
  • Updated MCP tracing snapshot assertions to expect dict outputs instead of Python repr strings.
  • Full suite: 2446 passed, 0 failures.
  • make format, make lint, make typecheck all clean.

Issue number

N/A — found via code inspection; no existing issue.

Checks

  • I've added new tests (if relevant)
  • I've added/updated the relevant documentation
  • I've run make lint and make format
  • I've made sure tests pass

FunctionSpanData.export() applied str(self.output) if self.output else
None, which corrupted outputs in two ways:

1. Dict/list outputs were converted to Python repr strings (single
   quotes, Python booleans) instead of remaining as dicts that
   json.dumps can serialize correctly.

2. Falsy but valid outputs (0, False, empty string, empty list) were
   silently replaced with None.

This is inconsistent with GenerationSpanData.export(), which passes
self.output through directly.

The fix removes the str() conversion to match the sibling pattern.
Updated MCP tracing snapshots accordingly and added unit tests covering
dict, string, None, falsy, list, and numeric output values.
Copilot AI review requested due to automatic review settings April 10, 2026 00:51
@github-actions github-actions bot added bug Something isn't working feature:tracing labels Apr 10, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b1041c2acd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

"name": self.name,
"input": self.input,
"output": str(self.output) if self.output else None,
"output": self.output,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve JSON-safe fallback for function span outputs

Passing self.output through unmodified can break trace export when a tool returns a non-JSON-serializable object (for example a custom class or datetime) and the BackendSpanExporter is configured with a non-default endpoint. In that configuration _should_sanitize_for_openai_tracing_api() is false, so httpx.Client.post(..., json=payload) receives raw span data and raises during JSON encoding, whereas the previous str(...) conversion kept function span payloads serializable. This makes tracing brittle for custom exporters even though the same tool output still works for run execution.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes FunctionSpanData.export() so tool outputs are exported without being coerced to str(...) or dropped when falsy, aligning behavior with GenerationSpanData.export() and correcting MCP tracing snapshots.

Changes:

  • Update FunctionSpanData.export() to pass through output directly (preserving dict/list and falsy values).
  • Add unit tests covering various output shapes and falsy values for FunctionSpanData.export().
  • Update MCP tracing test expectations to assert structured dict outputs instead of Python repr strings.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/agents/tracing/span_data.py Changes function span export to preserve output values/types.
tests/tracing/test_span_data.py Adds unit tests validating output preservation across types, including falsy values.
tests/mcp/test_mcp_tracing.py Updates snapshot assertions to expect structured dict tool outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 88 to 93
"type": self.type,
"name": self.name,
"input": self.input,
"output": str(self.output) if self.output else None,
"output": self.output,
"mcp_data": self.mcp_data,
}
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FunctionSpanData.export() now returns self.output verbatim, but Trace.export()/BackendSpanExporter expect the exported payload to be JSON-serializable. Tool results can be arbitrary Python objects (including Pydantic models like ToolOutputText), which will make httpx.post(..., json=payload) raise TypeError for non-OpenAI endpoints (and even for the OpenAI endpoint the sanitizer will degrade such values to a generic "<... truncated>" preview). Consider normalizing output to a JSON-compatible value (e.g., pass through dict/list/str/number/bool/None; convert Pydantic models via model_dump(mode="json"), dataclasses via asdict, tuples/sets via list; otherwise fall back to str(output)), while still preserving falsy values like 0/False/""/[].

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working feature:tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants