Multimodal

StreamWise: Serving Multi-Modal Generation in Real-Time at Scale

Advances in multi-modal generative models are enabling new applications, from storytelling to automated media synthesis. Most current workloads generate simple outputs (e.g., image generation from a prompt) in batch mode, often requiring several …

ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving

Large multimodal models (LMMs) demonstrate impressive capabilities in understanding images, videos, and audio beyond text. However, efficiently serving LMMs in production environments poses significant challenges due to their complex architectures …

Towards Efficient Large Multimodal Model Serving

Recent advances in generative AI have led to large multi-modal models (LMMs) capable of simultaneously processing inputs of various modalities such as text, images, video, and audio. While these models demonstrate impressive capabilities, efficiently …