Why does setting `--fp16 True` not save memory as expected?

sgugger · September 1, 2022, 11:07am

Where did you read it would save you memory? Training with mixed precision will be faster, but does not save memory when you train large models, because instead of having 1 model in FP32 in GPU RAM, you get 1 copy in FP32 and 1 copy in FP16 (so 1.5 times the memory). You save a bit with the activations being in FP16 instead of FP32 but it’s not always enough to allow you to increase the batch size.

Topic		Replies	Views
Memory footprint in mixed precision training? Beginners	1	827	June 29, 2023
Question about FP16/32, LoRA and GPU Memory Usage 🤗Transformers	1	3769	September 18, 2023
GPU OOM when training Beginners	2	3231	October 20, 2021
Mixed Precision training (fp16), how to use in production? 🤗Transformers	1	924	July 7, 2022
Why are huge batch sizes used for pretraining and small ones for finetuning? Research	3	10283	January 10, 2023

Why does setting `--fp16 True` not save memory as expected?

Related topics