Thank you very much for your work. Is it possible to achieve acceleration for the S2S mode, and if so, when is it expected to be supported?
I am currently attempting to develop an S2S version based on the already implemented S2T vllm acceleration, and I have encountered a major issue: In FunAudioChatDecoder, the crq_generate_forward (SRH head) needs to sample the next token based on speech_id, and this next token then becomes the input for SRH. However, vLLM handles the updating and merging of speech_id at the engine level. I would also appreciate guidance from experts on this problem.
Thank you very much for your work. Is it possible to achieve acceleration for the S2S mode, and if so, when is it expected to be supported?
I am currently attempting to develop an S2S version based on the already implemented S2T vllm acceleration, and I have encountered a major issue: In
FunAudioChatDecoder, thecrq_generate_forward(SRH head) needs to sample the next token based onspeech_id, and this next token then becomes the input for SRH. However, vLLM handles the updating and merging ofspeech_idat the engine level. I would also appreciate guidance from experts on this problem.