Best practices to use models requiring flash_attn on Apple silicon macs (or non CUDA)?

Just as I posted it, I found atleast one solution (the monkey patch approach) that works!

Can something like this be built into transformers so we don’t have to do it everytime?

1 Like