⚑ Prebuilt FlashAttention 2.8.0.post2 wheel for NVIDIA L40S (CUDA 12.1)

:rocket: Prebuilt FlashAttention 2.8.0.post2 wheel for NVIDIA L40S (CUDA 12.1)

Hi everyone! :waving_hand:

After a long compile session, I successfully built FlashAttention 2.8.0.post2
for NVIDIA L40S (Ada Lovelace) using :brain: CUDA 12.1, :puzzle_piece: PyTorch 2.9, and :snake: Python 3.10.

To save others a few hours of build time :hourglass_not_done:, I’m sharing the ready-to-use prebuilt .whl file. :high_voltage:


:white_check_mark: Tested environment:

  • :desktop_computer: GPU: NVIDIA L40S (Compute Capability 8.9 / sm_89)
  • :brain: CUDA: 12.1
  • :puzzle_piece: PyTorch: 2.9.0
  • :snake: Python: 3.10
  • :cloud: OS: Ubuntu 22.04 (Google Cloud VM)
  • :floppy_disk: Wheel size: ~111.5 MiB

:package: Download / Repository:
:backhand_index_pointing_right: [https://github.com/h1312200313122003-code/flash-attn-prebuilt-L40S/releases/tag/v2.8.0.post2-cu121-l40s\]

Everything runs smoothly with GPU acceleration β€” verified import, benchmark, and full compatibility :white_check_mark:
Hopefully this saves others a few hours and headaches compiling on L40S! :dashing_away:


:bookmark: Tagging for visibility:
#FlashAttention #CUDA #PyTorch #L40S #AdaLovelace

1 Like