Yimin Jiang (江逸敏)

Experience

Anuttacon | Lead of AI Infrastructure | 2025 - Present
Architecting scalable infrastructure and models for humanized machine intelligence. (Hiring! Reach out if interested.)

StepFun | Lead of Training Framework | 2023 - 2025
Optimized training infrastructure for LLM & Video/Audio workloads.
Projects: Step-1o, Step-1.5V, Step-2, Step-3 AFD, StepMesh.

ByteDance (AML->Seed) | Senior Research Scientist | 2021 - 2023
Pioneered the engineering frameworks for LLM pretraining and RL in Seed.
Prior projects include: BytePS/ByteCCL; Sparse MoE modeling and training.

Technical Highlights:
(1) Experienced in architecting, optimizing and diagnosing large scale systems over 10,000 GPUs.
(2) Responsible for training several (multi-modal) LLMs from scratch (cumulative FLOPs > 1e26).

Selected Publications

[MLSYS'26] Efficient Long-context Language Model Training by Core Attention Disaggregation.

[Tech Report] Model-system Co-design for Cost-effective Decoding.

[ASPLOS'26] PipeWeaver: Addressing Data Dynamicity in Large Multimodal Model Training with Dynamic Interleaved Pipeline.

[ASPLOS'26] Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training.

[SIGCOMM'25] DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal LLMs.

[SIGCOMM'25] Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers.

[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications.

[NSDI'25] Optimizing RLHF Training for Large Language Models with Stage Fusion.

[SIGCOMM'23] Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models.

[ATC'23] Accelerating Distributed MoE Training and Inference with Lina.

[OSDI'20] BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters.

See a full list at Google Scholar.