The CPU memory usage becomes very small during model inference

sumuyou · November 30, 2024, 12:26pm

I downloaded a GPT2-3.5B model from huggingface, when I used model = GPT2LMHeadModel.from_pretrained load the model from disk to inference on cpu,the memory usage is approximately 14GB，but after I used model.save_pretrained to save the model and tokenizer to a new folder,I load it for inference, the CPU memory usage is only about 1GB, and the generated content is correct.
It is worth noting that the original model parameters are bin file, and after saving they are safetensors.

Topic		Replies	Views
The memory usage about inference on CPU Beginners	0	19	December 2, 2024
Question about memory usage Beginners	0	910	May 15, 2023
How is memory managed when loading a model? Beginners	2	6213	July 4, 2023
Double expected memory usage Beginners	1	1408	August 17, 2022
Memory increasing after hugging face generate method Models	0	38	November 24, 2024

The CPU memory usage becomes very small during model inference

Related topics