Yimin Jiang (江逸敏)


Experience

  • Anuttacon | Lead of AI Infrastructure | 2025 - Present
    Architecting scalable infrastructure and models for humanized machine intelligence. (Hiring! Reach out if interested.)
  • StepFun | Lead of Training Framework | 2023 - 2025
    Optimized training infrastructure for LLM & Video/Audio workloads.
    Projects: Step-1o, Step-1.5V, Step-2, Step-3 AFD, StepMesh.
  • ByteDance (AML->Seed) | Senior Research Scientist | 2021 - 2023
    Pioneered the engineering frameworks for LLM pretraining and RL in Seed.
    Prior projects include: BytePS/ByteCCL; Sparse MoE modeling and training.
  • Technical Highlights:
    (1) Experienced in architecting, optimizing and diagnosing large scale systems over 10,000 GPUs.
    (2) Responsible for training several (multi-modal) LLMs from scratch (cumulative FLOPs > 1e26).


Selected Publications

  • [MLSYS'26] Efficient Long-context Language Model Training by Core Attention Disaggregation.
  • [Tech Report] Model-system Co-design for Cost-effective Decoding.
  • [ASPLOS'26] PipeWeaver: Addressing Data Dynamicity in Large Multimodal Model Training with Dynamic Interleaved Pipeline.
  • [ASPLOS'26] Exploiting Dynamic Sparsity to Accelerate Large-Scale Video DiT Training.
  • [SIGCOMM'25] DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal LLMs.
  • [SIGCOMM'25] Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers.
  • [ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications.
  • [NSDI'25] Optimizing RLHF Training for Large Language Models with Stage Fusion.
  • [SIGCOMM'23] Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models.
  • [ATC'23] Accelerating Distributed MoE Training and Inference with Lina.
  • [OSDI'20] BytePS: A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters.

See a full list at Google Scholar.