Load Phi 3 small on Nvidia Tesla V100 - Flash Attention

Hi,
I would like to inquire about the possibility of uploading and fine tuning a Phi 3 8k small. When I load the model, I get an error about missing Flash attention. If I want to install the given package, I get this error :

RuntimeError: FlashAttention is only supported on CUDA 11.6 and above.  Note: make sure nvcc has a supported version by running nvcc -V.


      torch.__version__  = 2.3.1+cu121

But I have the required version of pytorch and CUDA (torch 2.3.1 and cuda 12.1)
Is it because I am using a Tesla V100 graphics card? Is there any way to load the model also with this graphics card?
I found this in the documentation for the Phi 3 mini on Huggingface:

If you want to run the model on:
NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation="eager"

Does this also apply to the Phi3 Small 8k?? Beacause when I try to load it, the error occurs

model = AutoModelForSequenceClassification.from_pretrained("path", num_labels=num_labels ,attn_implementation="eager" )

AssertionError: Flash Attention is not available, but is needed for dense attention

Or should I try the ONNX version or it is just for inference?
Thank you.

v100 is Volta generation GPU and has Compute Compatibility 7.0, which is not supported by flash-attention. You need to have Ampere or later GPUs.

Based on transformers/src/transformers/models/phi3/modeling_phi3.py at 47c29ccfaf56947d845971a439cbe75a764b63d7 · huggingface/transformers · GitHub you should be able to run it in “eager” mode. Make sure you have the latest transformers library

@antonpolishko
Thank you for response.
Yes, I have the latest transformers library and I tried this:
model = AutoModelForSequenceClassification.from_pretrained("path", num_labels=num_labels ,attn_implementation="eager" )
But still I have the error:
AssertionError: Flash Attention is not available, but is needed for dense attention
Do you have some idea what to do?

I opened an issue on github at trnasformers. Unable to load model in eager mode.