Yimin Jiang (江逸敏)

Email: jymthu (at) gmail (dot) com [LinkedIn]

I am a researcher and a software engineer, working on large-scale AI infrastructure. Currently, I am interested in designing and optimizing the systems and algorithms for AGI, with a focus on (Multi-modal) LLMs, Mixture-of-Experts (MoE), and Generative Models for Image/Video.

I work at ByteDance AML since 2021, where I lead the distributed training team and also built the first engineering framework for LLM pretraining and RLHF-based alignment, known as the Seed Project. Prior to Seed, I was the tech lead of two core projects: (1) BytePS: collective communication optimizations for PyTorch and TensorFlow; (2) MoE for NLP and multi-modality encoders: increasing the base model capacity with large number of experts. Both BytePS and MoE models have been deployed on thousands of training and serving GPUs for ByteDance businesses, such as Douyin/Search/RecSys, etc.

I have experience optimizing and diagnosing systems involving >10000 GPUs, and had deeply participated in training models reaching up to 10²⁵ FLOPs.

I received my PhD in Computer Science from Tsinghua University in 2021, and my BS from Nanjing University in 2016.

Selected Papers

DSV: Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training.
Xin Tan, Yuetao Chen, Yimin Jiang, Xing Chen, Kun Yan, Nan Duan, Yibo Zhu, Daxin Jiang, Hong Xu
Preprint [PDF]

InfinitePOD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers.
Chenchen Shou, Guyue Liu, Hao Nie, Huaiyu Meng, Yu Zhou, Yimin Jiang, Wenqing Lv, Yelong Xu, Yuanwei Lu, Zhang Chen, Yanbo Yu, Yichen Shen, Yibo Zhu, Daxin Jiang
Preprint [PDF]

Teola: Towards End-to-End Optimization of LLM-based Applications.
Xin Tan, Yimin Jiang, Yitao Yang, Hong Xu
ASPLOS 2025 [PDF]

Adaptive Gating in Mixture-of-Experts based Language Models.
Jiamin Li, Qiang Su, Yitao Yang, Yimin Jiang, Cong Wang, Hong Xu
EMNLP 2023 [PDF]

Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models.
Juncai Liu, Jessie Hui Wang, Yimin Jiang
SIGCOMM 2023 [PDF]

Accelerating Distributed MoE Training and Inference with Lina.
Jiamin Li, Yimin Jiang, Yibo Zhu, Cong Wang, Hong Xu
ATC 2023 [PDF]

BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters.
Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, Chuanxiong Guo
OSDI 2020 [PDF] [CODE]