Skip to content

Conversation

@Speediing
Copy link

@Speediing Speediing commented Jan 26, 2026

Summary

  • Add RECOGNITION_USAGE event emission to ElevenLabs scribe_v2_realtime STT streaming mode
  • Reports audio_duration metrics via periodic collection (every 5 seconds)
  • Matches the pattern used by other STT providers (Deepgram, Gladia, FireworksAI)

Test plan

  • Tested with ElevenLabs STT in console mode
  • Verified STT metrics {"audio_duration": X.X} appears in logs
  • Confirmed metrics emission follows same pattern as Deepgram plugin

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added automatic periodic reporting of audio duration metrics during speech recognition sessions. Recognition usage events now include accumulated audio duration data to better track service usage.

✏️ Tip: You can customize this high-level summary in your review settings.

Add RECOGNITION_USAGE event emission to ElevenLabs scribe_v2_realtime
STT streaming mode. This reports audio_duration metrics, matching the
pattern used by other STT providers (Deepgram, Gladia, FireworksAI).

Uses a PeriodicCollector to accumulate and report audio duration every
5 seconds during streaming transcription.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@chenghao-mou chenghao-mou requested a review from a team January 26, 2026 22:18
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 26, 2026

📝 Walkthrough

Walkthrough

The change introduces an internal _PeriodicCollector generic mechanism that buffers audio frame durations and periodically flushes them to a callback. This collector is integrated into the STT and SpeechStream classes to accumulate audio duration data and emit RECOGNITION_USAGE events without modifying existing public APIs.

Changes

Cohort / File(s) Summary
Periodic Usage Collection
livekit/plugins/elevenlabs/stt.py
Introduces _PeriodicCollector[T] generic class for buffering and periodic flushing. Integrates into STT and SpeechStream to track audio frame durations (50ms frames), with _on_audio_duration_report handler emitting RECOGNITION_USAGE events when flushed (5s default interval).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • davidzhao

Poem

🐰 A collector so periodic, hopping through the stream,
Buffering frames with metric dreams, five seconds in between,
Audio durations gathered 'round, then flushed with care,
Usage events now reported fair—transparency in the air! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding audio duration tracking for ElevenLabs STT. It is concise, specific, and directly reflects the primary objective of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)

388-402: Add try-finally to send_task to ensure flush on cancellation.

If send_task is cancelled while processing frames (via gracefully_cancel at line 459), the flush() at line 402 never executes, losing any accumulated audio duration metrics since the last periodic flush interval. FireworksAI's plugin handles this correctly by calling flush() in the outer finally block.

Wrap the async for loop in a try-finally:

Suggested fix
async def send_task(ws: aiohttp.ClientWebSocketResponse) -> None:
    nonlocal closing_ws
    
    # ... setup code ...
    
    try:
        async for data in self._input_ch:
            # ... frame processing ...
    finally:
        self._audio_duration_collector.flush()
        closing_ws = True
🧹 Nitpick comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)

46-71: Consider constraining TypeVar T to types supporting addition.

The _PeriodicCollector relies on += (line 62), but the generic T has no constraint. This works for float but will fail at runtime for types without __add__. A protocol or bound would make the contract explicit.

Suggested improvement
-T = TypeVar("T")
+from typing import Protocol
+
+class Addable(Protocol):
+    def __add__(self: "T", other: "T") -> "T": ...
+
+T = TypeVar("T", bound=Addable)

Alternatively, since this is internal and only used with float, you could simply remove the generic and type it directly as float to avoid overengineering.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f46b591 and 8157ce1.

📒 Files selected for processing (1)
  • livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

  • livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: livekit-plugins-openai
  • GitHub Check: livekit-plugins-cartesia
  • GitHub Check: livekit-plugins-deepgram
  • GitHub Check: unit-tests
  • GitHub Check: type-check (3.9)
  • GitHub Check: type-check (3.13)
🔇 Additional comments (2)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (2)

342-345: LGTM!

The collector initialization is clean and follows the pattern described in the PR objectives. The 5-second interval matches the expected behavior for metrics collection.


609-615: No issues found. The stt.RecognitionUsage(audio_duration=duration) API usage is correct and matches the dataclass definition in livekit-agents.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@Speediing Speediing changed the title feat(elevenlabs): add audio duration tracking for STT streaming feat(elevenlabs): add audio duration tracking for ElevenLabs STT Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants