Flash Attention 2 Error on Mistral Based Model

luisfrentzen · December 18, 2023, 7:05am

Hi, I am trying to enable flash attention 2 on a model yet I got this error:

ValueError: past key much have a shape of (`batch_size, num_heads, self.config.sliding_window-1, head_dim`), got torch.Size([4, 8, 3968, 128])

I am using openchat’s openchat_3.5 7B model which I believe is based on mistral openchat/openchat_3.5 · Hugging Face. I am loading the model as such:

model = AutoModelForCausalLM.from_pretrained(model_name, 
                             device_map="auto", 
                             torch_dtype=torch.bfloat16,
                             use_flash_attention_2="flash_attention_2",
                             low_cpu_mem_usage=True)

Can someone explain the error for me? Or point me to a resource which can help me understand the problem, thank you.

Topic		Replies	Views
Shouldn't `_flash_attn_2_enabled` be documented? 🤗Transformers	1	5741	November 30, 2023
FlashAttention or equivalent? 🤗Transformers	0	919	April 30, 2023
Flash attention has no effect on inference 🤗Transformers	7	16459	September 4, 2024
Enabling Flash Attention 2 🤗Transformers	2	6367	July 3, 2024
Swapping GPT-2 Attention with Flash Attention 🤗Transformers	3	3033	June 4, 2023

Flash Attention 2 Error on Mistral Based Model

Related topics