To run the 7B model in full precision, you need 7 * 4 = 28GB of GPU RAM. You should add torch_dtype=torch.float16
to use half the memory and fit the model on a T4.
15 Likes
To run the 7B model in full precision, you need 7 * 4 = 28GB of GPU RAM. You should add torch_dtype=torch.float16
to use half the memory and fit the model on a T4.