This website uses cookies
We use cookies to continuously improve your experience on our site. More info.
A system designed to run large language models efficiently by managing memory and requests smartly. It is often used to serve modern machine learning models at scale.
| First released | 2023 |
| Developed by | UC Berkeley |
| Open-source | Yes |
vLLM is known for improving throughput and reducing costs when serving large language models.
Leave your email and we'll notify you
as soon as the term is added.