Prebuilt FlashAttention 2.8.0.post2 wheel for NVIDIA L40S (CUDA 12.1)
Hi everyone! ![]()
After a long compile session, I successfully built FlashAttention 2.8.0.post2
for NVIDIA L40S (Ada Lovelace) using
CUDA 12.1,
PyTorch 2.9, and
Python 3.10.
To save others a few hours of build time
, Iβm sharing the ready-to-use prebuilt .whl file. ![]()
Tested environment:
GPU: NVIDIA L40S (Compute Capability 8.9 / sm_89)
CUDA: 12.1
PyTorch: 2.9.0
Python: 3.10
OS: Ubuntu 22.04 (Google Cloud VM)
Wheel size: ~111.5 MiB
Download / Repository:
[https://github.com/h1312200313122003-code/flash-attn-prebuilt-L40S/releases/tag/v2.8.0.post2-cu121-l40s\]
Everything runs smoothly with GPU acceleration β verified import, benchmark, and full compatibility ![]()
Hopefully this saves others a few hours and headaches compiling on L40S! ![]()
Tagging for visibility:
#FlashAttention #CUDA #PyTorch #L40S #AdaLovelace