AGT-2474: add commit user turn support for realtime models #4622

chenghao-mou · 2026-01-26T15:44:08Z

This adds commit_user_turn support for realtime models:

OpenAI realtime model will sends 3 messages recommended by the official doc: https://platform.openai.com/docs/guides/realtime-conversations#disable-vad
Other realtime models will ignore this call

This allows users to use turn_detection="manual" with a realtime model.

Summary by CodeRabbit

New Features
- Added an explicit "commit user turn" action for real-time agent sessions to finalize a user's turn.
- OpenAI provider: committing a user turn also triggers response creation.
- Google/AWS/Ultravox providers: the action exists but logs warnings or acts as a placeholder where unsupported.
- Voice agent: when a real-time session is active, the realtime commit path is invoked before continuing audio processing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-26T15:44:36Z

📝 Walkthrough

Walkthrough

Adds an abstract commit_user_turn() to RealtimeSession and implements it across realtime provider plugins; agent_activity now invokes the realtime session's commit_user_turn() when present and still proceeds to the audio-recognition commit path. Provider implementations either perform finalization (OpenAI) or log unsupported warnings.

Changes

Cohort / File(s)	Summary
Abstract Interface `livekit-agents/livekit/agents/llm/realtime.py`	Adds abstract `def commit_user_turn(self) -> None` to `RealtimeSession`.
Agent Layer `livekit-agents/livekit/agents/voice/agent_activity.py`	`commit_user_turn` now calls `self._rt_session.commit_user_turn()` when `_rt_session` exists, then continues to assert and call the existing audio-recognition `commit_user_turn` (no early return).
AWS Realtime Plugin `livekit-plugins/livekit-plugins-aws/.../realtime/realtime_model.py`	Adds `commit_user_turn()` that logs a warning indicating Nova Sonic Realtime API does not support user-turn commit.
Google Realtime Plugin `livekit-plugins/livekit-plugins-google/.../realtime/realtime_api.py`	Adds `commit_user_turn()` (warning). Replaces `commit_audio()` and `clear_audio()` no-ops with warning logs about Gemini Realtime API not supporting those actions.
OpenAI Realtime Plugin (stable & beta) `livekit-plugins/livekit-plugins-openai/.../realtime/realtime_model.py`, `.../realtime_model_beta.py`	Implements `commit_user_turn()` to warn on auto-response/turn-detection, call `commit_audio()`, and emit a `ResponseCreateEvent` (empty params) to finalize the user turn.
Ultravox Realtime Plugin `livekit-plugins/livekit-plugins-ultravox/.../realtime/realtime_model.py`	Adds `commit_user_turn()` that logs unsupported warning; changes `push_video()` from no-op to a warning log.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant AgentActivity as AgentActivity
participant RTSession as RealtimeSession
participant AudioRec as AudioRecognition
Note over AgentActivity,RTSession,AudioRec: High-level user-turn commit flow
AgentActivity->>RTSession: commit_user_turn()
alt RT session exists
RTSession-->>AgentActivity: ack / handle turn-finalization
AgentActivity->>AudioRec: commit_user_turn(...) 0.5
AudioRec-->>AgentActivity: commit result
else No RT session
AgentActivity->>AudioRec: commit_user_turn(...) 0.5
AudioRec-->>AgentActivity: commit result
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I thumped my paw to mark the turn,

A gentle hop, the paddles churn,
Some plugins warn, some send a bell,
A tiny hop to say all's well,
— rabbit clap and carrot swell 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main change: adding a new commit_user_turn method across realtime model implementations to support manual turn detection.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/commit-user-turn-realtime

Tip

🧪 Unit Test Generation v2 is now available!

We have significantly improved our unit test generation capabilities.

To enable: Add this to your .coderabbit.yaml configuration:

reviews:
  finishing_touches:
    unit_tests:
      enabled: true

Try it out by using the @coderabbitai generate unit tests command on your code files or under ✨ Finishing Touches on the walkthrough!

Have feedback? Share your thoughts on our Discord thread!

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bml1g12

This looks good to me, as it means the docs here https://docs.livekit.io/agents/logic/turns/#manual would apply also to realtime model

it might make sense to have clear_user_turn() also call self.clear_audio() for realtime model? I say this because I think then https://docs.livekit.io/agents/logic/turns/#manual would fully apply

# When user starts speaking
@ctx.room.local_participant.register_rpc_method("start_turn")
async def start_turn(data: rtc.RpcInvocationData):
    session.interrupt()  # Stop any current agent speech
    session.clear_user_turn()  # Clear any previous input
    session.input.set_audio_enabled(True)  # Start listening

As for cascaded models clear_user_turn() clears any previous model input, but for realtime model we also need to clear the audio I think

longcw · 2026-01-27T12:27:16Z

livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/realtime_model.py

+                response=RealtimeResponseCreateParams(),
+            )
+        )
+        self.clear_audio()


why a clear_audio is needed here?

I know, right? It seems redundant, but it is required for OpenAI according to their doc:

Send input_audio_buffer.clear before beginning a new user input.

I think it means you need to clear the buffer before next time you want to start a new user speech but not means it's required after response.create for this turn.

maybe it's similar to the session.clear_user_turn in the example @bml1g12 mentioned above

# When user starts speaking @ctx.room.local_participant.register_rpc_method("start_turn") async def start_turn(data: rtc.RpcInvocationData): session.interrupt() # Stop any current agent speech session.clear_user_turn() # Clear any previous input session.input.set_audio_enabled(True) # Start listening

~~Yeah, I can put that call in the clear_user_turn part.~~

Turns out we don't need this if we call session.clear_user_turn.

ah yes clear_user_turn already calls clear_audio() under the hood it seems, so indeed when you start the new turn you probably want to clear audio, now when you end the turn - and indeed that means probably not needed in this PR

longcw

lgtm, my last question:

longcw · 2026-01-30T08:06:37Z

livekit-agents/livekit/agents/voice/agent_activity.py

    def commit_user_turn(self, *, transcript_timeout: float, stt_flush_duration: float) -> None:
+        if self._rt_session is not None:
+            self._rt_session.commit_user_turn()
+            return


can we commit to both rt session and audio recognition?

Yeah, that's a good point. I don't see why we can't. I will skip the return here.

I think that would be relevant when using external STT with openai realtime model right? Actually we are currently exploring that, I.e. using manual turn taking with a discrete STT engine (not using openai server side whisper as STT)

I wonder if we take this approach, if it's possible to ensure the local and remote chat contexts are consistent and also that we have a local chat context we can use to recover the real time session if there's any networking issues.

It seems to me that when we commit the STT transcripts to local context (which maybe is what would happen if we commit the audio recognition in this code?) that we get a healthy looking local livekit chat context, but that somehow (presumably by livekit design) gets upserted automatically to the remote openai chat context, and I'm not sure if that's safe or not, given from OpenAI server side perspective they automatically add an empty user "content" message when server side STT is disabled (which I assume represents OpenAI's internal representation of those audio tokens, which is not human readable hence no "content")

This slack post explains in more detail what I am refering to

It looks to me that maybe if using an external STT, we would ideally want to put the user STT transcriptions to local chat context (i.e. so it can be recovered if the server connection is broken) but there is no need to upset it to remot context (because OpenAI realtime API model has no need at all for user side STT transcriptions to my understanding, given it processes raw audio tokens). Is that possible in livekit I wonder though? I imagine typically livekit tries to keep local and remote contexts in sync, which creates a (theoretical) problem here when we have local STT not needed by remote model. Although this is a theoretical problem, I should note I have not yet seen any adverse effects of this, just being cautious as it would be tough to spot if there was an issue I think

STT is essentially emitting the final transcript for the realtime mode in this case, and realtime model doesn't really process/respond to final transcripts (different from generate_reply which is direct user input)

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@livekit-agents/livekit/agents/voice/agent_activity.py`:
- Around line 978-981: Wrap the realtime call in commit_user_turn so exceptions
from self._rt_session.commit_user_turn() (e.g., RealtimeError) are caught and do
not prevent the STT commit from running; specifically, in the commit_user_turn
method surround the call to self._rt_session.commit_user_turn() with a
try/except that catches RealtimeError (or a broad Exception if RealtimeError
isn't imported) and ignores or logs it, then ensure the subsequent STT commit
still executes unconditionally.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd8f80a and 6e82766.

📒 Files selected for processing (1)

livekit-agents/livekit/agents/voice/agent_activity.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Format code with ruff
Run ruff linter and auto-fix issues
Run mypy type checker in strict mode
Maintain line length of 100 characters maximum
Ensure Python 3.9+ compatibility
Use Google-style docstrings

Files:

livekit-agents/livekit/agents/voice/agent_activity.py

🧠 Learnings (1)

📚 Learning: 2026-01-22T03:28:16.289Z

Learnt from: longcw
Repo: livekit/agents PR: 4563
File: livekit-agents/livekit/agents/beta/tools/end_call.py:65-65
Timestamp: 2026-01-22T03:28:16.289Z
Learning: In code paths that check capabilities or behavior of the LLM processing the current interaction, prefer using the activity's LLM obtained via ctx.session.current_agent._get_activity_or_raise().llm instead of ctx.session.llm. The session-level LLM may be a fallback and not reflect the actual agent handling the interaction. Use the activity LLM to determine capabilities and to make capability checks or feature toggles relevant to the current processing agent.

Applied to files:

livekit-agents/livekit/agents/voice/agent_activity.py

🧬 Code graph analysis (1)

livekit-agents/livekit/agents/voice/agent_activity.py (2)

livekit-agents/livekit/agents/voice/audio_recognition.py (1)

commit_user_turn (248-320)

livekit-agents/livekit/agents/voice/agent_session.py (1)

commit_user_turn (978-998)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: livekit-plugins-deepgram
GitHub Check: unit-tests
GitHub Check: type-check (3.9)
GitHub Check: type-check (3.13)

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-30T11:54:59Z

livekit-agents/livekit/agents/voice/agent_activity.py

    def commit_user_turn(self, *, transcript_timeout: float, stt_flush_duration: float) -> None:
+        if self._rt_session is not None:
+            self._rt_session.commit_user_turn()
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for commit_user_turn implementation and exception types rg -n "def commit_user_turn" --type py -A 5

Repository: livekit/agents

Length of output: 6596

🏁 Script executed:

#!/bin/bash # Search for llm.RealtimeError or similar exception definitions rg -n "class.*RealtimeError|RealtimeError" --type py -A 2

Repository: livekit/agents

Length of output: 9008

🏁 Script executed:

#!/bin/bash # Check the context around lines 978-981 to see what follows commit_user_turn sed -n '975,990p' livekit-agents/livekit/agents/voice/agent_activity.py

Repository: livekit/agents

Length of output: 782

🏁 Script executed:

#!/bin/bash # Search for AudioRecognition.commit_user_turn to understand the full flow rg -n "AudioRecognition.*commit_user_turn|\.commit_user_turn" livekit-agents/livekit/agents/voice/agent_activity.py -B 2 -A 2

Repository: livekit/agents

Length of output: 494

Wrap realtime commit to ensure STT commit always runs.

The _rt_session.commit_user_turn() call should be guarded with exception handling. While current provider implementations don't raise exceptions, this ensures the STT commit at line 983 always executes even if a future provider implementation raises RealtimeError. This pattern is already established elsewhere in the codebase for other realtime operations.

Suggested fix

if self._rt_session is not None: - self._rt_session.commit_user_turn() + try: + self._rt_session.commit_user_turn() + except llm.RealtimeError: + logger.exception("failed to commit user turn on realtime session")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def commit_user_turn(self, *, transcript_timeout: float, stt_flush_duration: float) -> None:

if self._rt_session is not None:

self._rt_session.commit_user_turn()

def commit_user_turn(self, *, transcript_timeout: float, stt_flush_duration: float) -> None:

if self._rt_session is not None:

try:

self._rt_session.commit_user_turn()

except llm.RealtimeError:

logger.exception("failed to commit user turn on realtime session")

🤖 Prompt for AI Agents

In `@livekit-agents/livekit/agents/voice/agent_activity.py` around lines 978 - 981, Wrap the realtime call in commit_user_turn so exceptions from self._rt_session.commit_user_turn() (e.g., RealtimeError) are caught and do not prevent the STT commit from running; specifically, in the commit_user_turn method surround the call to self._rt_session.commit_user_turn() with a try/except that catches RealtimeError (or a broad Exception if RealtimeError isn't imported) and ignores or logs it, then ensure the subsequent STT commit still executes unconditionally.

chenghao-mou added 2 commits January 26, 2026 15:36

add commit user turn support for realtime models

f0ca5f1

minor fixes

267bea1

chenghao-mou requested a review from a team January 26, 2026 15:44

bml1g12 approved these changes Jan 27, 2026

View reviewed changes

longcw reviewed Jan 27, 2026

View reviewed changes

remove clear audio call

dd8f80a

longcw approved these changes Jan 30, 2026

View reviewed changes

skip return

6e82766

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

AGT-2474: add commit user turn support for realtime models #4622

Are you sure you want to change the base?

AGT-2474: add commit user turn support for realtime models #4622

Conversation

chenghao-mou commented Jan 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

bml1g12 left a comment

Choose a reason for hiding this comment

Uh oh!

longcw Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

chenghao-mou Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

longcw Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenghao-mou Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bml1g12 Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

longcw left a comment

Choose a reason for hiding this comment

Uh oh!

longcw Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

chenghao-mou Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bml1g12 Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chenghao-mou Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chenghao-mou commented Jan 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 26, 2026 •

edited

Loading

longcw Jan 27, 2026 •

edited

Loading

chenghao-mou Jan 27, 2026 •

edited

Loading

bml1g12 Jan 27, 2026 •

edited

Loading

chenghao-mou Jan 30, 2026 •

edited

Loading

bml1g12 Jan 30, 2026 •

edited

Loading

chenghao-mou Jan 30, 2026 •

edited

Loading