Windows 安装 LLM 部署服务 vLLM

vLLM LLM Windows About 1,082 words

前置条件

必须有英伟达显卡，必须安装了cuda。

vllm-windows

GitHub 地址: https://github.com/SystemPanic/vllm-windows

安装依赖

torch

pip install torch==2.11.0+cu126 torchvision==0.26.0+cu126 torchaudio==2.11.0+cu126 --index-url https://download.pytorch.org/whl/cu126

llguidance xgrammar

pip install llguidance xgrammar

wheel

从vllm-windows的Release中下载whl文件，并安装

pip install vllm-0.19.0+cu124-cp312-cp312-win_amd64.whl --extra-index-url https://download.pytorch.org/whl/nightly/cu126

vllm

pip install vllm

启动服务

vllm serve Qwen/Qwen2.5-1.5B-Instruct

指定参数

vllm serve Qwen/Qwen2.5-1.5B-Instruct --gpu-memory-utilization 0.7 --max-model-len 4096 --max-num-seqs 4

可能的错误

显存不足

ValueError: Free memory on device cuda:0 (6.89/8.0 GiB) on startup is less than desired GPU memory utilization (0.9, 7.2 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.

为安装 CUDA

安装地址：https://developer.nvidia.com/cuda-toolkit-archive

ValueError: CUDA_LIB_PATH is not set. CUDA_LIB_PATH need to be set with the absolute path to CUDA root folder on Windows (for example, set CUDA_LIB_PATH=C:\CUDA\v12.4)

Views: 238 · Posted: 2026-05-11

——— Thanks for Reading ———

Give me a Star, Thanks:)

https://github.com/fendoudebb/LiteNote

扫描下方二维码关注公众号和小程序↓↓↓