Cloud Computing | Haoran Qiu

FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms

The emergence of ML in various cloud system management tasks (e.g., workload autoscaling and job scheduling) has become a core driver of ML-centric cloud platforms. However, there are still numerous algorithmic and systems challenges that prevent …

When Green Computing Meets Performance and Resilience SLOs

This paper addresses the urgent need to transition to global net-zero carbon emissions by 2050 while retaining the ability to meet joint performance and resilience objectives. The focus is on the computing infrastructures, such as hyperscale cloud …

SmartOClock: Workload- and Risk-Aware Overclocking in the Cloud

Operating server components beyond their voltage and power design limits (i.e., overclocking) enables improving performance and lowering cost for cloud workloads. However, overclocking can significantly degrade component lifetime, increase power …

AWARE: Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems

Workload autoscaling is widely used in public and private cloud systems to maintain stable service performance and save resources. However, it remains challenging to set the optimal resource limits and dynamically scale each workload at runtime. …

SIMPPO: A Scalable and Incremental Online Learning Framework for Serverless Resource Management

Serverless Function-as-a-Service (FaaS) offers improved programmability for customers, yet it is not server-"less" and comes at the cost of more complex infrastructure management (e.g., resource provisioning and scheduling) for cloud providers. To …

A Mean-Field Game Approach to Cloud Resource Management with Function Approximation

Reinforcement learning (RL) has gained increasing popularity for resource management in cloud services such as serverless computing. As self-interested users compete for shared resources in a cluster, the multi-tenancy nature of serverless platforms …

Reinforcement Learning for Resource Management in Multi-tenant Serverless Platforms

Serverless Function-as-a-Service (FaaS) is an emerging cloud computing paradigm that frees application developers from infrastructure management tasks such as resource provisioning and scaling. To reduce the tail latency of functions and improve …

Is Function-as-a-Service a Good Fit for Latency-Critical Services?

Function-as-a-Service (FaaS) is becoming an increasingly popular cloud-deployment paradigm for serverless computing that frees application developers from managing the infrastructure. At the same time, it allows cloud providers to assert control in …

CoCo: Coordinated Container Scheduling with Last-Level Cache and Memory Bandwidth Partitioning

Last-level cache (LLC) and memory bandwidth partitioning are commonly used in existing work to meet QoS requirements of all co-scheduled latency-critical applications consolidated on a physical server. With the increasing popularity of cloud microservices and Function-as-a-Service paradigm, the number of containers consolidated together increases significantly. However, due to the limitation of hardware features, existing work fails to support such number of applications. To bridge this gap, this project proposes CoCo, coordinated container scheduling with LLC and Memory bandwidth partitioning. Our quantitative evaluation shows that CoCo outperforms no-partitioning and baseline approaches by up to 920% and 9.4% respectively.

FIRM: An Intelligent Fine-Grained Resource Management Frameworkfor SLO-Oriented Microservices

Modern user-facing, latency-sensitive web services include numerous distributed, intercommunicating microservices that promise to simplify software development and operation. However, multiplexing compute-resources across microservices is still …