Shouldn't `_flash_attn_2_enabled` be documented?

dinhanhx · November 30, 2023, 3:46am

While reading the Llama code, I found out that we can use flash attention via option flash_attn_2_enabled at these lines. However, this can not be seen in LlamaConfig.

After bit googling, I think to use flash attention we need Dao-AILab/flash-attention right?

marcsun13 · November 30, 2023, 5:24pm

Hi @dinhanhx, we integrated flashattention2 recently. You can learn more about the integration here. Note that not every model supports flashattention2.

Topic		Replies	Views
Enabling Flash Attention 2 🤗Transformers	2	5511	July 3, 2024
Transformers llama flash_attn_varlen_func questions 🤗Transformers	0	211	July 29, 2024
FlashAttention or equivalent? 🤗Transformers	0	905	April 30, 2023
FlashAttention-2's 16 bit requirement 🤗Optimum	2	2398	December 26, 2023
Compatibility of flash attention 2 and type conversion due to accelerator.prepare 🤗Accelerate	0	765	April 6, 2024

Shouldn't `_flash_attn_2_enabled` be documented?

Related topics