Publications

ATC'24
Power-aware Deep Learning Model Serving with µ-Serve.

In Proceedings of the 2024 USENIX Annual Technical Conference (ATC 2024).
  Artifact Available, Functional, Reproduced

PDF Preprint Code Slides Video

DSN'24
When Green Computing Meets Performance and Resilience SLOs.

In Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2024 Distrupt Track).
  Selected at the NSF Workshop on Sustainable Computing for Sustainability 2024

PDF DOI

AIOps'24 @ASPLOS'24
QLM: Queue Management for Large Language Model Serving.

In Proceedings of the 5th International Workshop on Cloud Intelligence / AIOps (AIOps 2024).

PDF Preprint

MLSys Workshop @NeurIPS'23
On the Promise and Challenges of Foundation Models for Learning-based Cloud Systems Management.

In Proceedings of the 7th Workshop on ML for Systems at NeurIPS 2023 (MLSys Workshop 2023).
  Selected for Spotlight Presentation

PDF Slides

MLSys Workshop @NeurIPS'23
PARM: Adaptive Resource Allocation for Datacenter Power Capping.

In Proceedings of the 7th Workshop on ML for Systems at NeurIPS 2023 (MLSys Workshop 2023).

PDF

ATC'23
AWARE: Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems.

In Proceedings of the 2023 USENIX Annual Technical Conference (ATC 2023).
  Artifact Available, Functional, Reproduced
  Selected for Presentation at KubeCon + CloudNativeCon North America 2023

PDF Code Slides Video

NeurIPS'23
Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity.

In Proceedings of the 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023).

PDF

NeurIPS'22
A Mean-Field Game Approach to Cloud Resource Management with Function Approximation.

In Proceedings of the 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022).

PDF Slides Video

CompSys Workshop @IPDPS'22
Evaluating Hardware Memory Disaggregation under Delay and Contention.

In Proceedings of the 1st Workshop on Composable Systems Co-located with IPDPS 2022 (COMPSYS 2022).
  Best Presentation Award

PDF Video DOI

EuroMLSys Workshop @EuroSys'22
Reinforcement Learning for Resource Management in Multi-tenant Serverless Platforms.

In Proceedings of the 2nd European Workshop on Machine Learning and Systems Co-located with EuroSys 2022 (EuroMLSys 2022).

PDF Slides Video DOI

WoSC @Middleware'21
Is Function-as-a-Service a Good Fit for Latency-Critical Services?.

In Proceedings of the 7th International Workshop on Serverless Computing Co-located with ACM/IFIP Middleware 2021 (WoSC7).

PDF Code Slides Video DOI

OSDI'20
FIRM: An Intelligent Fine-Grained Resource Management Frameworkfor SLO-Oriented Microservices.

In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2020).
  Artifact Available, Functional, Reproduced
  Selected as ACM ICPE 2024 Data Challenge

PDF Code Dataset Slides Video DOI

DSN'18
OWL: Understanding and Detecting Concurrency Attacks.

In Proceedings of the 48th IEEE International Conference on Dependable Systems and Networks (DSN 2018).
  CVE-2017-12193, CVE-2017-7533

PDF Code Slides DOI

NSDI'18
PLOVER: Fast, Multi-core Scalable Virtual Machine Fault-tolerance.

In Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2018).

PDF Code Slides Video DOI