Search

Archit Patke

INDIGO: Page Migration for Hardware Memory Disaggregation Across a Network
Power-aware Deep Learning Model Serving with µ-Serve
QLM: Queue Management for Large Language Model Serving
FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
QLM: Queue Management for Large Language Model Serving
SIMPPO: A Scalable and Incremental Online Learning Framework for Serverless Resource Management
Evaluating Hardware Memory Disaggregation under Delay and Contention
Reinforcement Learning for Resource Management in Multi-tenant Serverless Platforms
Delay Sensitivity-driven Congestion Mitigation for HPC Systems