Enabling Flash Attention 2

varadhbhatnagar · May 21, 2024, 12:50pm

What is the difference between using Flash Attention 2 via

model = AutoModelForCausalLM.from_pretrained(ckpt, attn_implementation = "sdpa")

vs

model = AutoModelForCausalLM.from_pretrained(ckpt, attn_implementation = "flash_attention_2")

when Pytorch SDPA support FA2 according to docs ?

varadhbhatnagar · May 29, 2024, 9:54am

@ybelkada Can you shed some light on this?

saireddy · July 3, 2024, 2:41pm

@varadhbhatnagar were you able to figure out the difference?

Topic		Replies	Views
Compatibility of flash attention 2 and type conversion due to accelerator.prepare 🤗Accelerate	0	767	April 6, 2024
Shouldn't `_flash_attn_2_enabled` be documented? 🤗Transformers	1	5607	November 30, 2023
SDPA attention in e.g. Llama does not use fused accelerations 🤗Transformers	0	826	March 5, 2024
Flash Attention 2 Error on Mistral Based Model Beginners	0	621	December 18, 2023
Not seeing memory benefit to accelerate/FSDP2 🤗Accelerate	3	38	June 18, 2025