Flash attention has no effect on inference

Changing transformers version doesn’t seem to affect anything;

I found out that I had some NVTX calls which added overhead, but even after removing, Flash attention is slower on mistral. I’ve slightly modified the script you linked in your comment. These are the results:

image