I download a GPT2-3.5B model from huggingface,here is the files
I load the model just using
GPT2LMHeadModel.from_pretrained(path)
,the memory usage of CPU during inference is 14GB,but after i save the model using model.save_pretrained()
to a new folder, then i load it to inference, the memory usage is only 1GB. I test .safetensors and .bin, and get the same result