Modern user-facing, latency-sensitive web services include numerous distributed, intercommunicating microservices that promise to simplify software development and operation. However, multiplexing compute-resources across microservices is still challenging in production because contention for shared resources can cause latency spikes that violate the service-level objectives (SLOs) of user requests. This paper presents FIRM, an intelligent fine-grained resource management framework for predictable sharing of resources across microservices to drive up overall utilization. FIRM leveragesonline telemetry data and machine-learning methods to adaptively (i) detect/localize microservices that cause SLO violations, (ii) identify low-level resources in contention, and (iii) take actions to mitigate SLO violations by dynamic re-provisioning. Experiments across four microservice benchmarks demonstrate that FIRM reduces SLO violations by upto 16x while reducing the overall requested CPU limit by up to 62%. Moreover, FIRM improves performance predictability by reducing tail latencies by up to 11x.
USENIX Badges: Artifacts Available, Artifacts Functional & Results Reproduced.