Docker Blog | Docker

vLLM has quickly become the go-to inference engine for developers who need high-throughput LLM serving. We brought vLLM to Docker Model Runner for NVIDIA GPUs on Linux, then extended it to Windows via WSL2. That changes today. Docker Model Runner now supports vllm-metal, a new backend that brings v…