Why does setting `--fp16 True` not save memory as expected?

Where did you read it would save you memory? Training with mixed precision will be faster, but does not save memory when you train large models, because instead of having 1 model in FP32 in GPU RAM, you get 1 copy in FP32 and 1 copy in FP16 (so 1.5 times the memory). You save a bit with the activations being in FP16 instead of FP32 but it’s not always enough to allow you to increase the batch size.

1 Like