Suggest a model to fine-tune for multi-class classification on a 32GB M1 Mac?

Can anyone suggest a model I could use to experiment with fine-tuning for multi-class classification on a 32GB RAM M1 Max MacBook Pro? I’m using the Huggingface Transformers library with the mps device so it runs on the Mac’s GPU.

I’ve tried bert-base-cased, but its token window is only 512, which isn’t enough for my use case—I need 2k-4k. I’ve tried Mixtral-8x7B-v0.1, but it hangs halfway through the training run. I just tried longformer-base-4096 but it runs out of memory.

Can someone suggest something that has a chance of working?

Edit: if it matters, there are a lot of classes—around 160.

If your usecase is related to multi-class classification then I think you should try setfit. Hugging Face have a very good starter blog for this

you can go with

sentence-transformers/paraphrase-mpnet-base-v2

this model

1 Like