Power Management

TAPAS: Thermal- and Power-Aware Scheduling for LLM Inference in Cloud Platforms

The rising demand for generative large language models (LLMs) poses challenges for thermal and power management in cloud datacenters. Traditional techniques often are inadequate for LLM inference due to the fine-grained, millisecond-scale execution …

SmartOClock: Workload- and Risk-Aware Overclocking in the Cloud

Operating server components beyond their voltage and power design limits (i.e., overclocking) enables improving performance and lowering cost for cloud workloads. However, overclocking can significantly degrade component lifetime, increase power …

PARM: Adaptive Resource Allocation for Datacenter Power Capping

Energy efficiency is pressing in today's cloud datacenters. Various power management strategies, such as oversubscription, power capping, and dynamic voltage and frequency scaling, have been proposed and are in use by datacenter operators to better …