I downloaded a GPT2-3.5B model from huggingface, when I used model = GPT2LMHeadModel.from_pretrained
load the model from disk to inference on cpu,the memory usage is approximately 14GB,but after I used model.save_pretrained
to save the model and tokenizer to a new folder,I load it for inference, the CPU memory usage is only about 1GB, and the generated content is correct.
It is worth noting that the original model parameters are bin file, and after saving they are safetensors.
1 Like