About Me
I am a second-year Ph.D. student at School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University (SJTU), advised by Prof. Dahua Lin. Before joining SJTU, I received my Bachelor’s degree in Shangdong University (SDU).
My research interests focuses on building efficient deep learning systems through algorithm-system co-design, with a particular emphasis on optimizing AI workload performance on hardware accelerators like GPUs. I am dedicated to bridge the gap between theoretical models and practical implementations to deliver either scalable or economical AI solutions.
Publications
Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler (arXiv 2504.19442)
Size Zheng, Wenlei Bao, Qi Hou, Xuegui Zheng, Jin Fang, Chenhui Huang, Tianqi Li, Haojie Duanmu et al
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design (ICML 2025)
Haojie Duanmu, Xiuhong Li, Zhihang Yuan, Size Zheng, Jiangfei Duan, Xingcheng Zhang, and Dahua Lin
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models (COLM 2024 Spotlight Oral💡)
Haojie Duanmu, Zhihang Yuan, Xiuhong Li, Jiangfei Duan, Xingcheng Zhang, and Dahua Lin
MuxServe: Flexible Multiplexing for Efficient Multiple LLM Serving (ICML 2024)
Jiangfei Duan, Runyu Lu, Haojie Duanmu, Xiuhong Li, Xingcheng Zhang, Dahua Lin, Ion Stoica, and Hao Zhang
WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More
Yuxuan Yue, Zhihang Yuan, Haojie Duanmu, Sifan Zhou, Jianlong Wu, and Liqiang Nie