**Haoran Qiu** | Haoran Qiu

Latest

StreamWise: Serving Multi-Modal Generation in Real-Time at Scale
From Models to Operators: Rethinking Autoscaling Granularity for Large Generative Models
Sherlock: Reliable and Efficient Agentic Workflow Execution
Murakkab: Resource-Efficient Agentic Workflow Orchestration in Cloud Platforms
ModServe: Modality- and Stage-Aware Resource Disaggregation for Scalable Multimodal Model Serving
Medha: Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Towards Efficient Large Multimodal Model Serving
TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms
INDIGO: Page Migration for Hardware Memory Disaggregation Across a Network
Power-aware Deep Learning Model Serving with µ-Serve
QLM: Queue Management for Large Language Model Serving
FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms
When Green Computing Meets Performance and Resilience SLOs
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction
QLM: Queue Management for Large Language Model Serving
SmartOClock: Workload- and Risk-Aware Overclocking in the Cloud
On the Promise and Challenges of Foundation Models for Learning-based Cloud Systems Management
PARM: Adaptive Resource Allocation for Datacenter Power Capping
AWARE: Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems
Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity
SIMPPO: A Scalable and Incremental Online Learning Framework for Serverless Resource Management
A Mean-Field Game Approach to Cloud Resource Management with Function Approximation
Evaluating Hardware Memory Disaggregation under Delay and Contention
Reinforcement Learning for Resource Management in Multi-tenant Serverless Platforms
A Geography-Based P2P Overlay Network for Fast and Robust Blockchain Systems
Is Function-as-a-Service a Good Fit for Latency-Critical Services?
Delay Sensitivity-driven Congestion Mitigation for HPC Systems
FIRM: An Intelligent Fine-Grained Resource Management Frameworkfor SLO-Oriented Microservices
OWL: Understanding and Detecting Concurrency Attacks
PLOVER: Fast, Multi-core Scalable Virtual Machine Fault-tolerance