Fine Tuning LLama 3.2 1B Quantized Memory Requirements

I see, the dataset could also be a possible cause…
Well, the best practices for datasets are probably available in this forum or on GitHub if you search for them…:sweat_smile:

Also, depending on the model, gradient checking may not be available (I think it should be available in Llama 3.2 1B, though…), and there may still be some potential bugs in multi-GPU environments.

When trying to isolate the issue, it’s usually faster to temporarily switch to a smaller, simpler model or dataset.