Where did you read it would save you memory? Training with mixed precision will be faster, but does not save memory when you train large models, because instead of having 1 model in FP32 in GPU RAM, you get 1 copy in FP32 and 1 copy in FP16 (so 1.5 times the memory). You save a bit with the activations being in FP16 instead of FP32 but it’s not always enough to allow you to increase the batch size.
1 Like