Finetuning a small LLM on 32GB, 4vCPU

Is it possible to use microsoft/Phi-3-mini-4k-instruct with vCPU (I have a i3.xlarge databricks cluster at work with 4vCPU and 32GB memory)? I think it says this implementation uses flash attention, so I was trying to download microsoft/Phi-3-mini-4k-instruct-onnx instead. I have a few errors there, but just want to double check I am using the correct implementation and can just set the device_map to “auto” or “cpu” when I don’t have a GPU instance.

the background was I was trying to finetune an open source model, but without a GPU it seems like a pain…