-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Add denoiser support for Google STT plugin #4645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
### Summary
- Add `denoise_audio` and `snr_threshold` parameters to Google STT plugin to support audio denoising and SNR filtering
- Upgrade `google-cloud-speech` dependency from `>= 2` to `>= 2.33` to enable `DenoiserConfig` support
### Description
This PR adds support for Google Cloud Speech-to-Text's denoiser and SNR filtering features, which help improve transcription accuracy in noisy environments.
**New parameters:**
- `denoise_audio` (bool): Enables audio denoising to reduce background noise such as music, rain, or street traffic. Note: cannot remove background human voices.
- `snr_threshold` (float): Controls the minimum loudness of speech required for transcription. This helps filter out non-speech audio or background noise. Recommended values:
- `10.0 - 100.0` when `denoise_audio=True`
- `0.5 - 5.0` when `denoise_audio=False`
**Usage example:**
```python
from livekit.plugins.google import STT
stt = STT(
model="chirp_3",
location="us",
denoise_audio=True,
snr_threshold=20.0, # medium sensitivity
)
```
### Changes
- `livekit-plugins/livekit-plugins-google/pyproject.toml`: Updated `google-cloud-speech` version requirement
- `livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py`:
- Added `denoise_audio` and `snr_threshold` fields to `STTOptions`
- Added `build_denoiser_config()` method to `STTOptions`
- Updated `STT.__init__()` with new parameters
- Updated `_build_recognition_config()` to include denoiser config for V2 API
- Updated `SpeechStream._build_streaming_config()` to include denoiser config
- Updated `update_options()` methods in both `STT` and `SpeechStream` classes
### References
- [Google Cloud Speech-to-Text Chirp 3 Documentation](https://cloud.google.com/speech-to-text/docs/models/chirp-3)
📝 WalkthroughWalkthroughAdded denoise_audio and snr_threshold options to Google STT interfaces and options; implemented STTOptions.build_denoiser_config(); updated V2 recognition and streaming config assembly to inject DenoiserConfig when present; bumped google-cloud-speech minimum to 2.33. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant STT as STT
participant SpeechStream as SpeechStream
participant GoogleAPI as GoogleAPI
Client->>STT: create/update (denoise_audio, snr_threshold)
STT->>STT: update STTOptions with denoising fields
STT->>SpeechStream: propagate updated options
SpeechStream->>STT: request recognition config
STT->>STT: build_denoiser_config() -> DenoiserConfig?
STT->>GoogleAPI: StreamingRecognitionConfig (RecognitionConfig + optional DenoiserConfig)
GoogleAPI-->>SpeechStream: streaming transcripts/events
SpeechStream-->>Client: deliver transcripts
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 🧪 Unit Test Generation v2 is now available!We have significantly improved our unit test generation capabilities. To enable: Add this to your reviews:
finishing_touches:
unit_tests:
enabled: trueTry it out by using the Have feedback? Share your thoughts on our Discord thread! Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py`:
- Around line 323-343: The diagnostic warns about using a bare dict annotation
for recognition_config_kwargs; update its type to a parameterized Mapping/Dict
with concrete key/value types (e.g., Dict[str, Any] or Mapping[str, object]) so
mypy strict mode doesn't infer Any; locate the variable
recognition_config_kwargs in the function that constructs and returns
cloud_speech_v2.RecognitionConfig (the block that builds
recognition_config_kwargs and returns
RecognitionConfig(**recognition_config_kwargs)) and change the annotation there
(and similarly for the other dict usage around the 555-576 region) to a properly
parameterized type such as Dict[str, object] or Mapping[str, Any].
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings
Files:
livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py
🧬 Code graph analysis (1)
livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py (2)
livekit-agents/livekit/agents/utils/misc.py (1)
is_given(25-26)livekit-agents/livekit/agents/stt/stt.py (1)
model(115-124)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: unit-tests
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.9)
🔇 Additional comments (5)
livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py (5)
81-84: Good addition of denoise/SNR fields with NOT_GIVEN defaults.
Keeps backward compatibility while exposing the new options.
119-134: DenoiserConfig builder cleanly scoped to V2 + optional inputs.
The guardrails look solid and avoid V1 misuse.
156-193: Constructor docs and config wiring for denoise/SNR look solid.
Clear parameter behavior and correct propagation intoSTTOptions.Also applies to: 247-248
435-480: Option updates now propagate denoise/SNR to active streams.
Looks consistent with the rest of the option updates.
518-547: SpeechStream option updates correctly carry denoise/SNR and trigger reconnect.
The update path stays coherent with other config changes.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Summary
denoise_audioandsnr_thresholdparameters to Google STT plugin to support audio denoising and SNR filteringgoogle-cloud-speechdependency from>= 2to>= 2.33to enableDenoiserConfigsupportDescription
This PR adds support for Google Cloud Speech-to-Text's denoiser and SNR filtering features, which help improve transcription accuracy in noisy environments.
New parameters:
denoise_audio(bool): Enables audio denoising to reduce background noise such as music, rain, or street traffic. Note: cannot remove background human voices.snr_threshold(float): Controls the minimum loudness of speech required for transcription. This helps filter out non-speech audio or background noise. Recommended values:10.0 - 100.0whendenoise_audio=True0.5 - 5.0whendenoise_audio=FalseUsage example:
Changes
livekit-plugins/livekit-plugins-google/pyproject.toml: Updatedgoogle-cloud-speechversion requirementlivekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py:denoise_audioandsnr_thresholdfields toSTTOptionsbuild_denoiser_config()method toSTTOptionsSTT.__init__()with new parameters_build_recognition_config()to include denoiser config for V2 APISpeechStream._build_streaming_config()to include denoiser configupdate_options()methods in bothSTTandSpeechStreamclassesReferences
Summary by CodeRabbit
New Features
Chores