Haoran Qiu | Microsoft AzRS
Home
Publications
Awards
Services
Experiences
Contact
Autoscaling
ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving
Large multimodal models (LMMs) demonstrate impressive capabilities in understanding images, videos, and audio beyond text. However, efficiently serving LMMs in production environments poses significant challenges due to their complex architectures …
Cite
×