I am a researcher and a software engineer working on large-scale AI infrastructure.
My research interest is in the broad field of designing and implementing systems and algorithms for AGI,
with a current focus on (Multimodal) LLM pre-training and post-training.
I worked at ByteDance AML since 2021, where I led the distributed training team
and built the first engineering framework for training foundation models, known as the Seed Project.
Prior to Seed, I led the ByteCCL and MoE projects in ByteDance,
both of which had been deployed on thousands of training and serving GPUs for ByteDance businesses.
I have experience optimizing and diagnosing systems involving >10000 GPUs,
and had deeply participated in training several LLMs (with multimodality) reaching up to 1e25 FLOPs from scratch.
Papers
[Preprint] StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation.
Yinmin Zhong, Zili Zhang, Xiaoniu Song, Hanpeng Hu, Chao Jin, Bingyang Wu, Nuo Chen, Yukun Chen, Yu Zhou, Changyi Wan, Hongyu Zhou, Yimin Jiang, Yibo Zhu, Daxin Jiang
[Preprint] PipeWeaver: Addressing Data Dynamicity in Large Multimodal Model Training with Dynamic Interleaved Pipeline.
Zhenliang Xue, Hanpeng Hu, Xing Chen, Yimin Jiang, Yixin Song, Zeyu Mi, Yibo Zhu, Daxin Jiang, Yubin Xia, Haibo Chen
[ASPLOS'26] Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training.
Xin Tan, Yuetao Chen, Yimin Jiang, Xing Chen, Kun Yan, Nan Duan, Yibo Zhu, Daxin Jiang, Hong Xu
[SIGCOMM'25] DistTrain: Addressing model and data heterogeneity with disaggregated training for multimodal LLMs.
Zili Zhang, Yinmin Zhong, Yimin Jiang, Hanpeng Hu, Jianjian Sun, Zheng Ge, Yibo Zhu, Xin Jin
[SIGCOMM'25] Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers.
Chenchen Shou, Guyue Liu, Hao Nie, Huaiyu Meng, Yu Zhou, Yimin Jiang, Wenqing Lv, Yelong Xu, Yuanwei Lu, Zhang Chen, Yanbo Yu, Yichen Shen, Yibo Zhu, Daxin Jiang
[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications.
Xin Tan, Yimin Jiang, Yitao Yang, Hong Xu
[NSDI'25] Optimizing RLHF Training for Large Language Models with Stage Fusion.
Yinmin Zhong, Zili Zhang, Bingyang Wu, Shengyu Liu, Yukun Chen, Changyi Wan, Hanpeng Hu, Lei Xia, Yimin Jiang, Yibo Zhu, Xin Jin
[EMNLP'23] Adaptive Gating in Mixture-of-Experts based Language Models.
Jiamin Li, Qiang Su, Yitao Yang, Yimin Jiang, Cong Wang, Hong Xu
[SIGCOMM'23] Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models.
Juncai Liu, Jessie Hui Wang, Yimin Jiang
[ATC'23] Accelerating Distributed MoE Training and Inference with Lina.
Jiamin Li, Yimin Jiang, Yibo Zhu, Cong Wang, Hong Xu
[OSDI'20] BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters.
Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, Chuanxiong Guo
Tech Reports
[Arxiv] Step-Video-T2V: The Practice, Challenges, and Future of Video Foundation Model.*
[Arxiv] Step-Video-TI2V: A State-of-the-Art Text-Driven Image-to-Video Generation Model.*
[Arxiv] Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction.*
[Arxiv] Step1X-Edit: A Practical Framework for General Image Editing.
[Arxiv] Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model.
Note: * Indicates core contributions.