Machine Learning

Delay Sensitivity-driven Congestion Mitigation for HPC Systems

Modern high-performance computing (HPC) systems concurrently execute multiple distributed applications that contend for the high-speed network leading to congestion. Consequently, application runtime variability and suboptimal system utilization are …

FIRM: An Intelligent Fine-Grained Resource Management Frameworkfor SLO-Oriented Microservices

Modern user-facing, latency-sensitive web services include numerous distributed, intercommunicating microservices that promise to simplify software development and operation. However, multiplexing compute-resources across microservices is still …

AutoMice: A Testbed Framework for Self-Driving Systems

AutoMice is designed to ease the transition from testbed validation to deployment in production by using two abstraction layers on both the input and output of a self-driving system.