Cloud Computing

Reinforcement Learning for Resource Management in Multi-tenant Serverless Platforms

Serverless Function-as-a-Service (FaaS) is an emerging cloud computing paradigm that frees application developers from infrastructure management tasks such as resource provisioning and scaling. To reduce the tail latency of functions and improve …

Is Function-as-a-Service a Good Fit for Latency-Critical Services?

Function-as-a-Service (FaaS) is becoming an increasingly popular cloud-deployment paradigm for serverless computing that frees application developers from managing the infrastructure. At the same time, it allows cloud providers to assert control in …

CoCo: Coordinated Container Scheduling with Last-Level Cache and Memory Bandwidth Partitioning

Last-level cache (LLC) and memory bandwidth partitioning are commonly used in existing work to meet QoS requirements of all co-scheduled latency-critical applications consolidated on a physical server. With the increasing popularity of cloud microservices and Function-as-a-Service paradigm, the number of containers consolidated together increases significantly. However, due to the limitation of hardware features, existing work fails to support such number of applications. To bridge this gap, this project proposes CoCo, coordinated container scheduling with LLC and Memory bandwidth partitioning. Our quantitative evaluation shows that CoCo outperforms no-partitioning and baseline approaches by up to 920% and 9.4% respectively.

FIRM: An Intelligent Fine-Grained Resource Management Frameworkfor SLO-Oriented Microservices

Modern user-facing, latency-sensitive web services include numerous distributed, intercommunicating microservices that promise to simplify software development and operation. However, multiplexing compute-resources across microservices is still …

A Hadoop-like Distributed Computing Platform

This is a parallel distributed computing framework, which bears similarities with MapReduce/Hadoop. This platform consists of critical services such as an underlying distributed file system and a reliable membership protocol. It's implemented in Java and open-sourced on GitHub.

PLOVER: Fast, Multi-core Scalable Virtual Machine Fault-tolerance

Cloud computing enables a vast deployment of online services in virtualized infrastructures, making it crucial to provide fast fault-tolerance for virtual machines (VM). Unfortunately, despite much effort, achieving fast and multi-core scalable VM …