Skip to content

Pull requests: alibaba/rtp-llm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

amd qwen35 optimize fused_l2norm_qk
#920 opened Apr 22, 2026 by hxy0118 Collaborator Loading…
update: update kvcm client
#918 opened Apr 21, 2026 by lucky-zzz Collaborator Loading…
feat: refactor py model device
#917 opened Apr 21, 2026 by JackTan25 Collaborator Loading…
Defer engine and RPC loop start until after full server init
#916 opened Apr 21, 2026 by xinfei-shi Collaborator Loading…
Support batch_prefill && TPS bench mode
#914 opened Apr 21, 2026 by alibaba-miji Collaborator Loading…
6 tasks done
Feature/p2p connector complete
#910 opened Apr 17, 2026 by ZhihanYan Collaborator Loading…
refactor: refactor codes
#909 opened Apr 17, 2026 by JackTan25 Collaborator Loading…
perf: optimize MoE model weight loading (8.6x speedup)
#908 opened Apr 17, 2026 by netaddi Collaborator Loading…
3 tasks
Feat/hybrid cp gdn
#906 opened Apr 17, 2026 by yang1556 Collaborator Loading…
feat: support input_embeddings in inference pipeline
#905 opened Apr 17, 2026 by KrisCheng9 Collaborator Loading…
feat: upgrade rocm6.4.3 to rocm7.2.0
#904 opened Apr 16, 2026 by liaocz Collaborator Loading…
optimize beam search
#903 opened Apr 16, 2026 by parkerpang Loading…
feat: support xgrammer
#902 opened Apr 16, 2026 by wanglining97 Collaborator Loading…
[ROCm] Optimize Qwen3.5 with fused kernel and allreduce merging
#900 opened Apr 16, 2026 by chengshu-lcc Collaborator Loading…
feat: add Kimi Linear (KDA) model support
#899 opened Apr 16, 2026 by theNiemand Collaborator Loading…
feat: Qwen3.5 Blackwell GDN prefill optimization
#897 opened Apr 15, 2026 by netaddi Collaborator Loading…
3 tasks
限制性解码修改
#893 opened Apr 14, 2026 by Glen11111Z Loading…
Gb200 Qwen3.5 NVFP4
#888 opened Apr 14, 2026 by qqbbiu Collaborator Loading…
fix: fix nvfp4 dp2 cuda graph smoke crash bug
#887 opened Apr 14, 2026 by JackTan25 Collaborator Loading…
feat: more production robust
#885 opened Apr 13, 2026 by yyhclimacool Loading…
Implement true EP (Expert Parallelism) mode for Qwen3 ROCm MoE
#884 opened Apr 13, 2026 by Xu-Sheng-lin Collaborator Loading…
feat: [ROCm] support FP8 PTPC/PerBlock quantization for Qwen3.5
#882 opened Apr 13, 2026 by chengshu-lcc Collaborator Loading…
feat - optimize gemm weights load logic
#880 opened Apr 13, 2026 by alibaba-miji Collaborator Loading…
Feat: add concurrency test for mtp
#878 opened Apr 13, 2026 by JackTan25 Collaborator Loading…
ProTip! Filter pull requests by the default branch with base:main.