While reading the Llama code, I found out that we can use flash attention via option flash_attn_2_enabled
at these lines. However, this can not be seen in LlamaConfig.
After bit googling, I think to use flash attention we need Dao-AILab/flash-attention right?