Shouldn't `_flash_attn_2_enabled` be documented?

While reading the Llama code, I found out that we can use flash attention via option flash_attn_2_enabled at these lines. However, this can not be seen in LlamaConfig.

After bit googling, I think to use flash attention we need Dao-AILab/flash-attention right?

Hi @dinhanhx, we integrated flashattention2 recently. You can learn more about the integration here. Note that not every model supports flashattention2.