Conference··Atlanta, USA

Help! My LLM is a Resource Hog: How We Tamed Inference with Kubernetes and Open Source Muscle

KubeCon + CloudNativeCon North America 2025

KubernetesLLMGPUInferenceCNCF

Abstract

LLM inference is the new resource hog. GPUs sit underutilised, model loading dominates cold-start time, and teams ship workloads that look fine in isolation but fall over the moment another tenant lands on the same node. This KubeCon NA 2025 session walks through how we tamed inference on Kubernetes using open source primitives — sensible scheduling, GPU sharing strategies, and tenant isolation patterns that prevent one model from starving another. Expect a tour of the trade-offs between vertical scaling, multi-instance GPUs, and tenant clusters, with examples drawn from production deployments.

Resources

More Talks