Hi all,
Is there currently a way to extract the attention attribute from a model such as GPT-2 and swap it with Flash-Attention?
Thank you,
Enrico
Hi all,
Is there currently a way to extract the attention attribute from a model such as GPT-2 and swap it with Flash-Attention?
Thank you,
Enrico
I think you can multiplicate the positional embeddings, from what I have read, but it s not empirically tested.
I forgot to close this out. Resolved it awhile ago. You can swap the attention layers by building a wrapper.
Can you share your code on how to swap the standard attention with flash attention on HF models?