[Installation]: getting vLLM installed with a free-threaded Python interpreter (3.14t)

https://x.com/vllm_project/status/1942450223881605593 showed that it's possible to run vLLM in a free-threaded Python interpreter. That involved a lot of custom work to get dependencies to build, and the situation now (4 months later) is a lot better. This is meant as a tracking issue for all the dependencies of vLLM - to start with on Linux x86-64 (CPU and CUDA) - and as a "work list" for getting the remaining issues with those dependencies resolved. With a goal of `uv pip install vllm` to do the right thing out of the box in a clean 3.14t environment.

Python 3.14t is necessary, both because CPython 3.14t itself is much more stable than 3.13t and because there are a number of important packages (e.g., `cffi`, `aiohttp`) that do support 3.14t but won't support 3.13t. Of course this depends on vLLM itself supporting Python 3.14 (the default, with-GIL build) first - see ~gh-26994~ gh-34096 for that.

This is the dependency graph for vLLM, generated from the default dependencies for the latest release (0.11.0) on PyPI, with packages with compiled code with free-threading wheels on PyPI marked in green and those without in red (easier to browse in a fresh browser tab):

![Image](https://github.com/user-attachments/assets/23b82999-bf4b-4ef1-973e-c63bbb7b8505)

PyTorch 2.9.0 has 3.14/3.14t wheels marked as "preview"; 2.10.0 in January will contain full support. 

On Linux x86-64, here are all the packages from the `build`, `cpu` and `common` [requirements files](https://github.com/vllm-project/vllm/tree/main/requirements) that don't cleanly install from PyPI:

- [x] `msgspec`: has support in its `main` branch, installs fine from source. _EDIT 27 Nov '25: `msgspec` 0.20.0 has support and wheels on PyPI._
- [x] `numba`: a release with support is being worked on, see [numba#9928](https://github.com/numba/numba/issues/9928). Release planned for January says the Numba team. 
    - `numba` 0.65.0 added full cp314t support and wheels
- [ ] `opencv-python-headless`:  no support yet, tracking issue is [opencv#27933](https://github.com/opencv/opencv/issues/27933)
    - To build from source: clone the repo, check out [opencv-python#1051](https://github.com/opencv/opencv-python/pull/1051), ensure the build dependencies in `pyproject.toml` are installed (latest versions, ignore the pins) and then build with `export ENABLE_HEADLESS=1 && uv pip install . --no-build-isolation`
- [ ] `outlines-core`: latest version (0.2.14) builds fine from source after [outlines-core#235](https://github.com/dottxt-ai/outlines-core/pull/235), and the next release will have cp314t wheels (xref [outlines-core#248](https://github.com/dottxt-ai/outlines-core/issues/248))
- [x] `llguidance`: ~doesn't build~ now builds from source on `main` after  [llguidance#255](https://github.com/guidance-ai/llguidance/issues/255), needs some fixes (xref [llguidance#256](https://github.com/guidance-ai/llguidance/issues/256))
    - `llguidance >=1.6.0` has complete support and `cp314t` wheels
- [ ] `openai-harmony`: latest version on PyPI builds fine from source (xref [harmony#87](https://github.com/openai/harmony/issues/87) for support/wheels)
- [ ] `safetensors`: builds from source (`main` branch)
- [x] `xgrammar`: latest version (0.1.27) builds fine from source on Linux. xref [xgrammar#500](https://github.com/mlc-ai/xgrammar/issues/500) for full support/wheels. _EDIT: 0.1.31 has full support and cp314t wheels_
- [ ] `tokenizers`: latest version on PyPI builds fine from source

And the CUDA-specific dependencies that don't yet have support:

- [x] `flashinfer-python`: no upstream issue yet; ~uses stable ABI so doesn't build yet~ resolved by https://github.com/flashinfer-ai/flashinfer/pull/1687
- [ ] `ray`: revisit after 3.14 support (xref [ray#56434](https://github.com/ray-project/ray/issues/56434)) has landed
    - `ray` was made optional, so this is no longer blocking. Ray still doesn't even have regular 3.14 support as of 10 April 2026.
- [x] `xformers`: no support yet, not expected soon (xref [xformers#1345](https://github.com/facebookresearch/xformers/issues/1345)). 
    - Resolved in `xformers` 0.0.35 by removing any dependency on the CPython C API.

All other packages install fine. Note that there are dependencies that have optional C extensions that aren't yet working so those packages will be pulled in as pure Python currently (e.g., `cbor2` and `protobuf`). And there are packages with known thread-safety issues (e.g., `msgpack`, `protobuf`). Those packages may have wheels or seem to install fine, but do not yet have a check in the "PyPI release" column in https://py-free-threading.github.io/tracking/.

So as of today, everything except for `llguidance` is installable from PyPI or from source with a modest bit of effort. Something along these lines for `cpu`:

```bash
uv venv --python=3.14t venv-vllm
source venv-vllm/bin/activate
uv pip install torch torchaudio torchvision triton --torch-backend cpu
# Comment out `torch` in `build.txt`
uv pip install -r requirements/build.txt

# Now install the dependencies in the checklist above one by one
# You need GCC and a Rust compiler installed 

# Comment out `llguidance` in `common.txt`; also need to loose remove some == pins
uv pip install -r requirements/common.txt
```

After all that is done, it should be possible to build vLLM itself. PRs to get that to work:
- [x] [vllm#29241](https://github.com/vllm-project/vllm/pull/29241)
- [x] [flash-attention#112](https://github.com/vllm-project/flash-attention/pull/112)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Installation]: getting vLLM installed with a free-threaded Python interpreter (3.14t) #28762

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Installation]: getting vLLM installed with a free-threaded Python interpreter (3.14t) #28762

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions