Describe the bug
Looking at this code snippet:
|
const commitUserTurnTask = |
|
(delayDuration: number = 500) => |
|
async (controller: AbortController) => { |
|
if (Date.now() - this.lastFinalTranscriptTime > delayDuration) { |
|
// flush the stt by pushing silence |
|
if (audioDetached && this.sampleRate !== undefined) { |
|
const numSamples = Math.floor(this.sampleRate * 0.5); |
|
const silence = new Int16Array(numSamples * 2); |
|
const silenceFrame = new AudioFrame(silence, this.sampleRate, 1, numSamples); |
|
this.silenceAudioWriter.write(silenceFrame); |
|
} |
|
|
|
// wait for the final transcript to be available |
|
await delay(delayDuration, { signal: controller.signal }); |
|
} |
|
|
|
if (this.audioInterimTranscript) { |
|
// append interim transcript in case the final transcript is not ready |
|
this.audioTranscript = `${this.audioTranscript} ${this.audioInterimTranscript}`.trim(); |
|
} |
|
this.audioInterimTranscript = ''; |
|
|
|
const chatCtx = this.hooks.retrieveChatCtx(); |
|
this.logger.debug('running EOU detection on commitUserTurn'); |
|
this.runEOUDetection(chatCtx); |
|
this.userTurnCommitted = true; |
|
}; |
I noticed that we're always adding a 500ms delay even for realtime pipeline that has no STT turned on. I'm using manual turn detection where I call commitUserTurn(), and it's causing a 500ms delay before speech handle gets created.
I verified that the delay() function is being called even with no transcription.
here's my config:
const session = new voice.AgentSession({
llm: new openai.realtime.RealtimeModel({
model: 'gpt-realtime-mini',
turnDetection: null,
modalities: ['audio', 'text'],
}),
turnDetection: 'manual',
voiceOptions: {
preemptiveGeneration: false,
minEndpointingDelay: 0,
maxEndpointingDelay: 0,
minInterruptionDuration: 0,
allowInterruptions: false,
},
});
await session.start({
agent: this.agent,
room: this.room,
inputOptions: {
audioEnabled: true,
textEnabled: false,
},
outputOptions: {
audioEnabled: false,
transcriptionEnabled: false,
},
});
Relevant log output
No response
Describe your environment
System:
OS: macOS 14.7
CPU: (10) arm64 Apple M1 Max
Memory: 98.83 MB / 32.00 GB
Shell: 5.9 - /bin/zsh
Binaries:
Node: 24.11.1 - ~/.nvm/versions/node/v24.11.1/bin/node
npm: 11.6.2 - ~/.nvm/versions/node/v24.11.1/bin/npm
pnpm: 10.25.0 - /opt/homebrew/bin/pnpm
Watchman: 2025.11.10.00 - /opt/homebrew/bin/watchman
"@livekit/agents": "1.0.30",
"@livekit/agents-plugin-livekit": "1.0.30",
"@livekit/agents-plugin-openai": "1.0.30",
Minimal reproducible example
No response
Additional information
No response
Describe the bug
Looking at this code snippet:
agents-js/agents/src/voice/audio_recognition.ts
Lines 641 to 667 in 455b5ba
I noticed that we're always adding a 500ms delay even for realtime pipeline that has no STT turned on. I'm using manual turn detection where I call commitUserTurn(), and it's causing a 500ms delay before speech handle gets created.
I verified that the delay() function is being called even with no transcription.
here's my config:
Relevant log output
No response
Describe your environment
System:
OS: macOS 14.7
CPU: (10) arm64 Apple M1 Max
Memory: 98.83 MB / 32.00 GB
Shell: 5.9 - /bin/zsh
Binaries:
Node: 24.11.1 - ~/.nvm/versions/node/v24.11.1/bin/node
npm: 11.6.2 - ~/.nvm/versions/node/v24.11.1/bin/npm
pnpm: 10.25.0 - /opt/homebrew/bin/pnpm
Watchman: 2025.11.10.00 - /opt/homebrew/bin/watchman
Minimal reproducible example
No response
Additional information
No response