Haoran Qiu | Microsoft AzRS
Home
Publications
Experiences
Awards
Contact
GPU Scaling
Power-aware Deep Learning Model Serving with µ-Serve
With the increasing popularity of large deep learning modelserving workloads, there is a pressing need to reduce the energy consumption of a model-serving cluster while maintaining satisfied throughput or model-serving latency requirements. Model …
Cite
×