The memory usage about inference on CPU

I download a GPT2-3.5B model from huggingface,here is the files


I load the model just using GPT2LMHeadModel.from_pretrained(path),the memory usage of CPU during inference is 14GB,but after i save the model using model.save_pretrained() to a new folder, then i load it to inference, the memory usage is only 1GB. I test .safetensors and .bin, and get the same result

1 Like