To explain it fully would take a while, but to put it simply, installing Flash Attention can be quite a challenge depending on your environment.
I’ll break this into:
- What exactly your error means
- The real causes (there are two)
- Concrete solutions you can choose from (with commands)
1. What your error actually means
You already have Torch:
Successfully installed ... torch-2.6.0+cu124 torchaudio-2.6.0+cu124 torchvision-0.21.0+cu124
But when installing flash_attn==2.7.4.post1 you get:
Installing build dependencies ... done
Getting requirements to build wheel ... error
...
ModuleNotFoundError: No module named 'torch'
ERROR: Failed to build 'flash_attn'
and again via requirements.txt:
Collecting flash-attn==2.7.4.post1 (from -r requirements.txt)
...
Getting requirements to build wheel ... error
ModuleNotFoundError: No module named 'torch'
Key points:
- Torch is installed in your Conda env.
- The error is thrown inside a temporary build environment that pip creates just for building
flash_attn. That temporary env does not see your torch.
- This is a known, common problem with FlashAttention: PyPI’s page explicitly says that if you see
ModuleNotFoundError: No module named 'torch', it is “likely because of pypi’s installation isolation” and tells you to use pip install flash-attn --no-build-isolation. (PyPI)
On top of that, you are on Windows, and FlashAttention is a heavy CUDA/C++ extension that is mainly tested on Linux. A lot of people hit additional problems building it from source on Windows and solve it via prebuilt wheels or WSL2. (GitHub)
So:
- Immediate cause: pip’s build isolation environment cannot import
torch.
- Structural cause: FlashAttention is a compiled CUDA extension with weak native Windows support.
2. Background: what pip is doing and why Torch “disappears”
Modern pip uses PEP 517 build isolation for packages that declare a pyproject.toml (FlashAttention does).
When you run:
pip install flash_attn==2.7.4.post1
pip does roughly:
-
Reads pyproject.toml, sees a build backend (e.g. setuptools) and a list of build requirements.
-
Creates a temporary virtual environment in a temp directory (like your pip-build-env-gqm7d9al).
-
Installs only build-time dependencies there (setuptools, wheel, maybe CUDA/compilers).
-
Runs the build backend inside that isolated env to:
- compute build requirements (
get_requires_for_build_wheel),
- generate the wheel.
Inside this isolated build env:
Because torch is not installed in the isolated env, import torch fails →
ModuleNotFoundError: No module named 'torch'.
This exact pattern shows up in multiple issues on the FlashAttention repo and Stack Overflow, with identical logs (Installing build dependencies ... done, Getting requirements to build wheel ... error, then ModuleNotFoundError: No module named 'torch'). (GitHub)
The FlashAttention maintainers solved it by telling people to turn off build isolation:
pip install flash-attn --no-build-isolation
which forces pip to build using your real environment, where Torch is installed. (PyPI)
So the error message is misleading: Torch is installed, but not in the small private env pip uses for building.
3. Background: why FlashAttention is fussy, especially on Windows
FlashAttention is not a typical pure-Python library:
-
It provides custom fused CUDA kernels for attention, tightly tied to:
- specific PyTorch versions,
- specific CUDA versions,
- GPU architectures. (PyPI)
-
It is primarily developed and tested on Linux; Windows support exists but is patchy and depends on custom wheels or manual builds. (GitHub)
For Linux, the recommended path is:
- Use the right PyTorch+CUDA combination (you already have Torch 2.6.0+cu124).
- Install FlashAttention with
pip install flash-attn --no-build-isolation. (PyPI)
For Windows:
-
Many users report that building from source is slow and fragile (hours of compile time, Visual Studio / NVCC configuration, etc.). Guides suggest:
- Using MSVC Build Tools, CUDA toolkit, Ninja,
- Running pip in a VS “x64 Native Tools” prompt,
- Setting
MAX_JOBS and other environment variables. (Reddit)
-
Because of this, some people now publish pre-built Windows wheels just for FlashAttention:
mjun0812/flash-attention-prebuild-wheels (Linux + Windows, including FlashAttention 2.7.4.post1 with PyTorch 2.5–2.9 and CUDA 12.4–13.0). (GitHub)
flash_attn_windows by petermg on GitHub (wheels built for Windows, advertised on Reddit). (Reddit)
So building from source on Windows is possible, but often not worth the headache compared to installing a matching wheel.
4. LongCat-Video’s specific requirements
The official LongCat-Video README says to do exactly what you did: (GitHub)
# create conda environment
conda create -n longcat-video python=3.10
conda activate longcat-video
# install torch (configure according to your CUDA version)
pip install torch==2.6.0+cu124 torchvision==0.21.0+cu124 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
# install flash-attn-2
pip install ninja
pip install psutil
pip install packaging
pip install flash_attn==2.7.4.post1
# install other requirements
pip install -r requirements.txt
And then notes:
FlashAttention-2 is enabled in the model config by default; you can also change the model config (./weights/LongCat-Video/dit/config.json) to use FlashAttention-3 or xformers once installed. (GitHub)
So:
- Your versions (Python 3.10 + torch 2.6.0+cu124 + flash_attn 2.7.4.post1) are exactly what LongCat expects.
- The only reason you fail is the packaging / platform story around
flash_attn itself.
5. Concrete causes in your case
Let’s spell them out clearly and tie them directly to your logs.
Cause A: pip build isolation hides Torch from FlashAttention
- Torch is installed in
C:\Users\Owner\anaconda3\envs\longcat-video\lib\site-packages.
- Pip’s build env lives under
C:\Users\Owner\AppData\Local\Temp\pip-build-env-....
- FlashAttention’s setup is executed inside this temp env when pip runs
get_requires_for_build_wheel. It tries import torch and fails because Torch is not installed there.
- Result:
ModuleNotFoundError: No module named 'torch' and Failed to build 'flash_attn'.
Exactly this situation is documented in:
-
FlashAttention’s PyPI page, with the warning and recommended --no-build-isolation fix. (PyPI)
-
GitHub issues #309, #1920, etc., where users see identical logs and are told to either:
- install Torch first and then
- use
pip install flash-attn --no-build-isolation. (GitHub)
Cause B: flash_attn has no ready-made wheel for your Windows platform by default
-
On Windows, pip often cannot find a prebuilt wheel on PyPI for your exact combination (PyTorch 2.6.0 + CUDA 12.4 + Python 3.10 + flash_attn 2.7.4.post1). It then tries to build from source.
-
Building from source on Windows:
- Needs MSVC, CUDA toolkit, Ninja, correct environment variables,
- Is known to be error-prone and slow. (GitHub)
So even after you fix build isolation, you may still hit C++/CUDA build errors unless you either:
- install a matching prebuilt wheel, or
- move to Linux/WSL2 and follow the official instructions there, or
- skip FlashAttention entirely and change LongCat’s attention backend.
6. Solutions you can realistically choose from
There is no single “magic” fix; there are a few solid paths. I’ll list them from “least invasive” to “most robust”.
Option 1 — Try the official fix: build in your real env (no isolation)
This directly addresses the ModuleNotFoundError: No module named 'torch'.
-
Activate your env:
conda activate longcat-video
-
Make sure basic build tools are up to date:
python -m pip install --upgrade pip setuptools wheel
-
Install FlashAttention without build isolation:
python -m pip install --no-build-isolation "flash-attn==2.7.4.post1"
or (same thing, different spelling):
python -m pip install --no-build-isolation "flash_attn==2.7.4.post1"
This is exactly what the FlashAttention docs and multiple answers recommend when you see the “No module named ‘torch’” error. (PyPI)
-
If that succeeds, then:
python -m pip install -r requirements.txt
What can still go wrong:
- On Windows, this may now progress past the “torch missing” phase and then fail with MSVC/NVCC errors, RAM issues, etc. Guides warn that building from source can take 1–3+ hours and is sensitive to toolchain setup. (Reddit)
If your goal is to experiment quickly and not debug compilers, you may prefer Options 2 or 3 instead.
Option 2 — Install a prebuilt FlashAttention wheel for your exact stack
This avoids compiling on Windows entirely.
There is a dedicated repo mjun0812/flash-attention-prebuild-wheels that publishes prebuilt wheels for Linux and Windows. The latest release (v0.4.19) explicitly lists Windows wheels for:
- FlashAttention: 2.7.4.post1, 2.8.3
- Python: 3.10, 3.11, 3.12, 3.13
- PyTorch: 2.5, 2.6, 2.7, 2.8, 2.9
- CUDA: 12.4, 12.6, 13.0 (GitHub)
That perfectly matches:
- Python 3.10
- Torch 2.6.0+cu124
- CUDA 12.4
- FlashAttention 2.7.4.post1
Typical procedure:
-
Stay with your current env:
conda activate longcat-video
-
Visit the v0.4.19 release page for mjun0812/flash-attention-prebuild-wheels. (GitHub)
-
Download the .whl whose filename encodes:
flash_attn-2.7.4.post1
cp310 (Python 3.10)
torch2.6 (or similar)
cu124
win_amd64
-
Install it directly:
python -m pip install "C:\path\to\flash_attn-2.7.4.post1-...-cp310-cp310-win_amd64.whl"
-
Verify:
python -c "import flash_attn; print('flash_attn OK')"
-
Then:
python -m pip install -r requirements.txt
This way, pip sees that flash-attn==2.7.4.post1 is already installed and does not try to build it again.
Alternatives:
Tradeoffs:
- You must ensure the wheel’s Python, Torch, and CUDA versions match your environment exactly. If they don’t, you’ll get import or runtime errors.
- You’re relying on community-built wheels rather than official PyPI wheels, but this is a common route people take to get FlashAttention working on Windows. (GitHub)
Option 3 — Skip FlashAttention entirely and use a different backend
LongCat-Video’s README explicitly says: (GitHub)
FlashAttention-2 is enabled in the model config by default; you can also change the model config (./weights/LongCat-Video/dit/config.json) to use FlashAttention-3 or xformers once installed.
And in practice you can also design it to fall back to PyTorch’s native attention (SDPA) if you are okay with slower generation.
Steps:
-
Edit requirements.txt in the LongCat repo and comment out the FlashAttention line:
# flash-attn==2.7.4.post1
-
Install everything else:
conda activate longcat-video
python -m pip install -r requirements.txt
-
After you download the model weights (via huggingface-cli as in README), open:
./weights/LongCat-Video/dit/config.json
-
Look for fields controlling attention backend (naming may vary, e.g. attn_backend, use_flash_attn). Change them so they do not expect FlashAttention:
- If there is a
"backend" or "attn_backend" key set to "flash_attn_2", change it to something like "native" or "pytorch" (or "xformers" if you install xFormers and it has Windows wheels for your Torch/CUDA).
- Some configs may have
"use_flash_attn": true; change to false.
The exact key names depend on their implementation, but the README confirms that the backend is configurable and that FlashAttention-2 is just the default. (GitHub)
-
Run the demos as usual:
torchrun run_demo_text_to_video.py --checkpoint_dir=./weights/LongCat-Video --enable_compile
Downsides:
- Generation will be slower, especially for long high-res videos, because you’re not using fused attention kernels.
- But for testing, experimentation, and medium-length videos, this is usually perfectly acceptable and significantly easier than wrestling with FlashAttention on Windows.
This is often the most pragmatic tradeoff: you get LongCat working now, and you can always revisit FlashAttention later.
Option 4 — Use Linux or WSL2 and follow the README literally
LongCat-Video and FlashAttention are primarily tuned for Linux + CUDA. The README you’re following assumes that environment. (GitHub)
A very robust approach is:
-
Use a Linux environment (bare metal or WSL2 with GPU passthrough).
-
Install CUDA 12.4 and the exact LongCat stack:
- Python 3.10
- torch 2.6.0+cu124, torchvision, torchaudio
pip install flash-attn==2.7.4.post1 --no-build-isolation
pip install -r requirements.txt.
-
Optionally, use a preconfigured guide such as the DigitalOcean tutorial, which walks through setting up LongCat-Video on their GPU droplets with the right drivers/tooling. (DigitalOcean)
Pros:
- You’re running in the environment the authors and FlashAttention maintainers actually test.
- Fewer platform-specific surprises.
Cons:
- Requires either a dual-boot/WSL2 setup or a remote Linux machine.
7. Minor side notes about your environment
Not blockers now, but worth being aware of:
-
You installed:
psutil-7.1.3 manually, while requirements.txt wants psutil==6.0.0.
numpy-2.1.2, while requirements.txt later pulls numpy==1.26.4.
Pip will try to adjust these during pip install -r requirements.txt. This is usually fine, but mixing many manual installs with a strict requirements.txt can sometimes cause confusing version downgrades or binary-ABI mismatches (especially for numpy). Keeping a clean environment that only follows the README plus any deliberate change (like removing FlashAttention) reduces noise when you debug.
8. Short recap (causes and solutions)
Causes
- The
ModuleNotFoundError: No module named 'torch' during flash_attn install is caused by pip’s build isolation: FlashAttention’s build runs inside a temporary env that doesn’t have Torch, so import torch fails, even though Torch is installed in your Conda env. (PyPI)
- On Windows, pip often needs to compile FlashAttention from source because there’s no official wheel for your exact Python/Torch/CUDA combination on PyPI, and that compilation is complex and fragile. (GitHub)
Solutions
-
Option 1: Try the official fix:
python -m pip install --no-build-isolation flash-attn==2.7.4.post1 in your longcat-video env (then pip install -r requirements.txt). This fixes the “torch not found” piece but may still hit Windows compilation issues. (PyPI)
-
Option 2: Use a prebuilt Windows wheel for your exact stack from mjun0812/flash-attention-prebuild-wheels (FlashAttention 2.7.4.post1, Python 3.10, Torch 2.6, CUDA 12.4), install it via pip install path\to\wheel.whl, then run pip install -r requirements.txt. (GitHub)
-
Option 3: Skip FlashAttention entirely:
- Remove
flash-attn from requirements.txt,
- Change
./weights/LongCat-Video/dit/config.json to use a non-FlashAttention backend (native or xFormers),
- Accept slower but much simpler installs on Windows. (GitHub)
-
Option 4: Run LongCat-Video in a Linux/WSL2 environment (local or cloud) where the exact torch + FlashAttention combo is officially tested, following the README as-is. (GitHub)