Found a way to use flash-attention on HF gradio spaces

shivvor2 · November 27, 2025, 5:31pm

Basically, you can’t install flash attention in the requirements.txt because it hasn’t detected a torch installation yet.

Using subprocess to install does not work, and I haven’t been able to get it to work using the kernels library.

This approach (installing in requirements.txt using prebuilt wheels) is correct but fails because the wheel version is not correct (as of me writing this, spaces has python == 3.10, cuda == 12.3 by default).

The newest prebuilt wheels that satisfy these constrains is `https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl\\\` and you will need to force torch==2.4.1 in requirements.txt.

The version of the wheels can be WRONG when you read this, please check the python and cuda version from the build logs and find the correct wheels to use. If you don’t want to/ can’t do this, prebuild your wheels locally (hopefully in a seperate conda environement) and upload to your spaces repo for installation?

I genuinely hope this is not necessary

Topic		Replies	Views
How to install flash attention on HF gradio space Spaces	3	3247	June 16, 2024
Install nvcc / cuda kernels on hugging face spaces Spaces	0	185	August 1, 2024
How Can I Install Flash Attention 2 in a ZeroGPU Space Spaces	1	439	July 30, 2025
SentencePiece - OSError 🔒 Gradio	2	1519	April 15, 2022
Outdated gradio? Spaces	2	2940	June 3, 2022

Found a way to use flash-attention on HF gradio spaces

Related topics