vLLM

High-throughput and memory-efficient inference and serving engine for Large Language Models. Deploy AI faster with state-of-the-art performance.

vllm.ai