Haoran Qiu | Microsoft AzRS
Home
Publications
Experiences
Awards
Contact
Shengkun Cui
Latest
Power-aware Deep Learning Model Serving with ยต-Serve
FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
QLM: Queue Management for Large Language Model Serving
Cite
×