Can anyone suggest a model I could use to experiment with fine-tuning for multi-class classification on a 32GB RAM M1 Max MacBook Pro? I’m using the Huggingface Transformers library with the mps device so it runs on the Mac’s GPU.
I’ve tried bert-base-cased
, but its token window is only 512, which isn’t enough for my use case—I need 2k-4k. I’ve tried Mixtral-8x7B-v0.1
, but it hangs halfway through the training run. I just tried longformer-base-4096
but it runs out of memory.
Can someone suggest something that has a chance of working?
Edit: if it matters, there are a lot of classes—around 160.