Training On Mac M3 Max.. blazing fast but

pechaut · November 28, 2023, 5:42pm

Hi All,

I have received my brand new M3 max, and discovered sadly that BitsAndBytes is not supported, So I had to adapt my training code to fine tune Mistral on my dataset.
=:> Changed the device to the proper device
=> Remove the bnb config
=> remove the load 4 / 8 bit to true or false
=> change the optim to AdamW_torch (my previous was a paged 32b and so used bitsAndBytes)
=> Changed the batch to 10… because I do what I want, it’s my life

It started to process with an estimated time of 50min, (6hrs on my TitanX)
Loss was decreasing as usual… below 1, and suddenly up to 4, then 0…

Same code is running fine on my Titan

Am I the only one to experience this kind of behaviour on Apple Silicon ?
Is anyone using transformers (FFTrainer) to fine tune on mac Mx ?

Cheers

pechaut · November 29, 2023, 3:41pm

Ok, found the issue, if I set a batch of 2 then fine, the loss is correct, and training is ok, if I put a higher value… random issues… Mmmm… I can go up to 4 on Titan 16gb before I hit the ram limit. I can not on a 64gb M3… not because of RAM but because of… no idea

Tsomerville · December 24, 2023, 6:47am

can you please show your full code? i am so close with my code, but still having this same error.
also, are you loading the model quantized or not?

pechaut · December 24, 2023, 11:08am

head over:

You will get my source for multiplatform training.

I don’t load in quantized format,

Topic		Replies	Views
Quantizing a model on M1 Mac for qlora 🤗Transformers	0	1628	March 14, 2024
Fine tuning with apple m3 Beginners	0	1018	March 11, 2024
Performance of mtb-7b on mac M1 Beginners	0	1265	January 3, 2024
BitsAndBytes transformers issue 🤗Transformers	1	2432	September 15, 2023
12% into epoch training loss drops to 0.0 Intermediate	2	645	March 6, 2024

Training On Mac M3 Max.. blazing fast but

Related topics