Hi,
I would like to inquire about the possibility of uploading and fine tuning a Phi 3 8k small. When I load the model, I get an error about missing Flash attention. If I want to install the given package, I get this error :
RuntimeError: FlashAttention is only supported on CUDA 11.6 and above. Note: make sure nvcc has a supported version by running nvcc -V.
torch.__version__ = 2.3.1+cu121
But I have the required version of pytorch and CUDA (torch 2.3.1 and cuda 12.1)
Is it because I am using a Tesla V100 graphics card? Is there any way to load the model also with this graphics card?
I found this in the documentation for the Phi 3 mini on Huggingface:
If you want to run the model on:
NVIDIA V100 or earlier generation GPUs: call AutoModelForCausalLM.from_pretrained() with attn_implementation="eager"
Does this also apply to the Phi3 Small 8k?? Beacause when I try to load it, the error occurs
model = AutoModelForSequenceClassification.from_pretrained("path", num_labels=num_labels ,attn_implementation="eager" )
AssertionError: Flash Attention is not available, but is needed for dense attention
Or should I try the ONNX version or it is just for inference?
Thank you.