-
Notifications
You must be signed in to change notification settings - Fork 2.7k
feat(elevenlabs): add audio duration tracking for ElevenLabs STT #4629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add RECOGNITION_USAGE event emission to ElevenLabs scribe_v2_realtime STT streaming mode. This reports audio_duration metrics, matching the pattern used by other STT providers (Deepgram, Gladia, FireworksAI). Uses a PeriodicCollector to accumulate and report audio duration every 5 seconds during streaming transcription. Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
|
📝 WalkthroughWalkthroughThe change introduces an internal _PeriodicCollector generic mechanism that buffers audio frame durations and periodically flushes them to a callback. This collector is integrated into the STT and SpeechStream classes to accumulate audio duration data and emit RECOGNITION_USAGE events without modifying existing public APIs. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)
388-402: Add try-finally to send_task to ensure flush on cancellation.If send_task is cancelled while processing frames (via
gracefully_cancelat line 459), theflush()at line 402 never executes, losing any accumulated audio duration metrics since the last periodic flush interval. FireworksAI's plugin handles this correctly by callingflush()in the outer finally block.Wrap the
async forloop in a try-finally:Suggested fix
async def send_task(ws: aiohttp.ClientWebSocketResponse) -> None: nonlocal closing_ws # ... setup code ... try: async for data in self._input_ch: # ... frame processing ... finally: self._audio_duration_collector.flush() closing_ws = True
🧹 Nitpick comments (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (1)
46-71: Consider constraining TypeVarTto types supporting addition.The
_PeriodicCollectorrelies on+=(line 62), but the genericThas no constraint. This works forfloatbut will fail at runtime for types without__add__. A protocol or bound would make the contract explicit.Suggested improvement
-T = TypeVar("T") +from typing import Protocol + +class Addable(Protocol): + def __add__(self: "T", other: "T") -> "T": ... + +T = TypeVar("T", bound=Addable)Alternatively, since this is internal and only used with
float, you could simply remove the generic and type it directly asfloatto avoid overengineering.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: livekit-plugins-openai
- GitHub Check: livekit-plugins-cartesia
- GitHub Check: livekit-plugins-deepgram
- GitHub Check: unit-tests
- GitHub Check: type-check (3.9)
- GitHub Check: type-check (3.13)
🔇 Additional comments (2)
livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/stt.py (2)
342-345: LGTM!The collector initialization is clean and follows the pattern described in the PR objectives. The 5-second interval matches the expected behavior for metrics collection.
609-615: No issues found. Thestt.RecognitionUsage(audio_duration=duration)API usage is correct and matches the dataclass definition in livekit-agents.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Summary
RECOGNITION_USAGEevent emission to ElevenLabsscribe_v2_realtimeSTT streaming modeaudio_durationmetrics via periodic collection (every 5 seconds)Test plan
STT metrics {"audio_duration": X.X}appears in logs🤖 Generated with Claude Code
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.