Training On Mac M3 Max.. blazing fast but

Hi All,

I have received my brand new M3 max, and discovered sadly that BitsAndBytes is not supported, So I had to adapt my training code to fine tune Mistral on my dataset.
=:> Changed the device to the proper device :slight_smile:
=> Remove the bnb config
=> remove the load 4 / 8 bit to true or false
=> change the optim to AdamW_torch (my previous was a paged 32b and so used bitsAndBytes)
=> Changed the batch to 10… because I do what I want, it’s my life

It started to process with an estimated time of 50min, (6hrs on my TitanX)
Loss was decreasing as usual… below 1, and suddenly up to 4, then 0…

Same code is running fine on my Titan

Am I the only one to experience this kind of behaviour on Apple Silicon ?
Is anyone using transformers (FFTrainer) to fine tune on mac Mx ?


Ok, found the issue, if I set a batch of 2 then fine, the loss is correct, and training is ok, if I put a higher value… random issues… Mmmm… I can go up to 4 on Titan 16gb before I hit the ram limit. I can not on a 64gb M3… not because of RAM but because of… no idea

can you please show your full code? i am so close with my code, but still having this same error.
also, are you loading the model quantized or not?

head over:

You will get my source for multiplatform training.

I don’t load in quantized format,