The emergence of large language models (LLMs) has introduced excessive computational demands and unique execution patterns (i.e., nondeterministic execution time due to autoregressive patterns) for cloud providers. Consequently, existing LLM serving …