Haoran Qiu | Microsoft AzRS
Home
Publications
Experiences
Awards
Contact
Long Context
Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
As large language models (LLMs) handle increasingly longer contexts, serving long inference requests of millions of tokens presents unique challenges. We show that existing work for long context inference is largely based on techniques from long …
Cite
×