Mistral from Huggingface is slow

I was trying to run a distillation on the Mistral model. I am using the Huggingface version with flash attention. The model is running exceptionally slow. The GPUs are stuck at 0% utilization. Is it normal?

1 Like