Skip to content

refactor(openai): Split token counting by API for easier deprecation#5930

Merged
ericapisani merged 10 commits intomasterfrom
ep/py-2271-refactor-openai-token-counting-8y2
Apr 2, 2026
Merged

refactor(openai): Split token counting by API for easier deprecation#5930
ericapisani merged 10 commits intomasterfrom
ep/py-2271-refactor-openai-token-counting-8y2

Conversation

@ericapisani
Copy link
Copy Markdown
Member

@ericapisani ericapisani commented Apr 1, 2026

  • Ensures that token usage is correctly passed into the relevant token calculation method (when applicable)
  • Replace shared _calculate_token_usage() and _get_usage() with two API-specific functions: _calculate_completions_token_usage() (Chat Completions / Embeddings) and _calculate_responses_token_usage() (Responses API)
  • Each function accesses only the exact field names for its API — no more multi-name lookup patterns like ["input_tokens", "prompt_tokens"]
  • Add support for streaming_message_token_usage from stream_options={"include_usage": True} in streaming Completions
  • Add API section comments in _set_common_output_data to clarify which branch handles which API
  • Convert all call sites to use keyword arguments for readability

Motivation

When Chat Completions is deprecated, removing it should be a simple delete operation without auditing shared code. Before this change, _calculate_token_usage handled both APIs with interleaved logic, making it unclear what was safe to remove.

We also needed to ensure that, when handling streamed responses from the chat completions API, the usage in the final chunk of the message correctly made its way to the _calculate_*_token_usage method.

Other design decisions to note

Some duplication of code between the completions api and responses api token counting

This is to allow for easier removal of the completions api logic once it's fully deprecated/removed from the OpenAI API.

ericapisani and others added 2 commits April 1, 2026 13:11
…Responses API functions

Replace the shared `_calculate_token_usage()` and `_get_usage()` with
two API-specific functions: `_calculate_completions_token_usage()` and
`_calculate_responses_token_usage()`. This makes it clear which token
fields belong to which API and enables clean removal of Chat Completions
support when it is deprecated.

- Completions function extracts `prompt_tokens`, `completion_tokens`,
  `total_tokens` and supports `streaming_message_token_usage` for
  stream_options include_usage
- Responses function extracts `input_tokens`, `output_tokens`,
  `total_tokens` plus `cached_tokens` and `reasoning_tokens` details
- Add API section comments in `_set_common_output_data`
- Update all call sites to use the appropriate API-specific function
- Convert Completions call sites to use keyword arguments
- Update and rename unit tests; add Responses API token usage tests
- Add sync and async streaming tests for usage-in-stream

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…n_usage call sites

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ericapisani ericapisani requested a review from a team as a code owner April 1, 2026 11:30
@linear-code
Copy link
Copy Markdown

linear-code bot commented Apr 1, 2026

@ericapisani ericapisani marked this pull request as draft April 1, 2026 11:30
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Semver Impact of This PR

🟢 Patch (bug fixes)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • (integrations) Instrument pyreqwest tracing by servusdei2018 in #5682

Internal Changes 🔧

  • (openai) Split token counting by API for easier deprecation by ericapisani in #5930
  • (opentelemetry) Ignore mypy error by alexander-alderman-webb in #5927
  • Fix license metadata in setup.py by sl0thentr0py in #5934
  • Update validate-pr workflow by stephanie-anderson in #5931

🤖 This preview updates automatically when you update the PR.

@ericapisani
Copy link
Copy Markdown
Member Author

bugbot run

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Codecov Results 📊

2084 passed | ⏭️ 170 skipped | Total: 2254 | Pass Rate: 92.46% | Execution Time: 3m 8s

📊 Comparison with Base Branch

Metric Change
Total Tests 📈 +134
Passed Tests 📈 +298
Failed Tests
Skipped Tests 📉 -164

All tests are passing successfully.

❌ Patch coverage is 0.00%. Project has 12241 uncovered lines.
✅ Project coverage is 41.78%. Comparing base (base) to head (head).

Files with missing lines (1)
File Patch % Lines
openai.py 4.18% ⚠️ 664 Missing
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
+ Coverage    40.47%    41.78%    +1.31%
==========================================
  Files          190       190         —
  Lines        20963     21027       +64
  Branches      7196      7345      +149
==========================================
+ Hits          8484      8786      +302
- Misses       12479     12241      -238
- Partials       434       475       +41

Generated by Codecov Action

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Autofix Details

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Response usage silently overrides streaming token usage
    • Changed the second if statement to elif to ensure streaming_message_token_usage takes precedence over response.usage, preventing silent data loss.

Create PR

Or push these changes by commenting:

@cursor push 6e65058f68
Preview (6e65058f68)
diff --git a/sentry_sdk/integrations/openai.py b/sentry_sdk/integrations/openai.py
--- a/sentry_sdk/integrations/openai.py
+++ b/sentry_sdk/integrations/openai.py
@@ -164,8 +164,7 @@
 
     if streaming_message_token_usage:
         usage = streaming_message_token_usage
-
-    if hasattr(response, "usage"):
+    elif hasattr(response, "usage"):
         usage = response.usage
 
     if usage is not None:

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

… usage

The refactor that split _calculate_token_usage into separate Completions
and Responses functions dropped extraction of prompt_tokens_details.cached_tokens
and completion_tokens_details.reasoning_tokens from the Completions path.
This restores those fields so spans for cached prompts and reasoning models
(e.g. o1/o3) report complete token usage metrics.

Also fixes streaming usage priority: streaming_message_token_usage now
correctly takes precedence over response.usage via elif.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test used MagicMock(message="one") where message was a plain string,
but the real OpenAI API returns Choice objects with message.content. The
counting code checks hasattr(choice.message, "content"), which failed on
strings, so manual token counting was never exercised. Use real Choice
and ChatCompletionMessage objects and fix the expected output_tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ericapisani
Copy link
Copy Markdown
Member Author

bugbot run

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

ericapisani and others added 4 commits April 1, 2026 15:47
The stream_options parameter was not available in early versions of the
OpenAI Python SDK, causing TypeError on v1.0.1 CI runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e_total_token_usage

Clarify that this variable holds the total token usage from streaming
responses. Also fix the None check to use `is not None` instead of
truthiness, preventing false negatives when usage has zero-valued fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace inline hasattr+isinstance patterns in both _calculate_completions_token_usage
and _calculate_responses_token_usage with a shared helper that uses getattr safely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ericapisani ericapisani marked this pull request as ready for review April 2, 2026 08:53
…aming responses

Move _calculate_completions_token_usage outside the data_buf check so
token usage from stream metadata is recorded even when no content chunks
are produced (e.g. content filter). Also count output tokens from
response.output when streaming_message_responses is absent in the
Responses API path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@alexander-alderman-webb alexander-alderman-webb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice!
Pretty much all as expected and now handling more edge cases while also having simpler code in the end 🙏 .

See my comment regarding two new tests that rely on private functions. After that I can stamp.

@ericapisani ericapisani merged commit fc83474 into master Apr 2, 2026
167 of 169 checks passed
@ericapisani ericapisani deleted the ep/py-2271-refactor-openai-token-counting-8y2 branch April 2, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants