There are any number of models on HuggingFaces that seem to require flash_attn, even though my understanding is most models can actually work fine without it. A few examples: [image] microsoft/Florence-2-large · Hugging Face We’re on a journey to advance and democratize artificial intel…

Best practices to use models requiring flash_attn on Apple silicon macs (or non CUDA)?

kerrmetric July 16, 2024, 4:37am 2

Just as I posted it, I found atleast one solution (the monkey patch approach) that works!

Can something like this be built into transformers so we don’t have to do it everytime?

4 Likes

Topic		Replies	Views
Running PyTorch + Huggingface on Apple Silicon (M1) Beginners	1	1701	August 24, 2022
Running mpt-7b on Mac m1 Beginners	1	3714	May 22, 2023
Ssues with GPU Configuration and Model Deployment on Hugging Face Spaces Beginners	0	178	May 29, 2024
Prakash Hinduja Switzerland (Swiss) Can I use Hugging Face models for real-time inference on edge devices? Beginners	1	20	June 23, 2025
CUDA profiling Hugging Face code? Beginners	0	123	June 10, 2024