The CPU memory usage becomes very small during model inference

I downloaded a GPT2-3.5B model from huggingface, when I used model = GPT2LMHeadModel.from_pretrained load the model from disk to inference on cpu,the memory usage is approximately 14GB,but after I used model.save_pretrained to save the model and tokenizer to a new folder,I load it for inference, the CPU memory usage is only about 1GB, and the generated content is correct.
It is worth noting that the original model parameters are bin file, and after saving they are safetensors.

1 Like