Regarding Quantizing gpt2-xl, gpt2-large, &c


I have spent some time now trying to quantize gpt2-large and am having trouble. I’m following the example provided here. The process fails when running quantizer.export The specific error is:

FileNotFoundError: [Errno 2] No such file or directory: '.../transformer.wte.weight'

The error is self-explanatory. Judging by this discussion and the use_external_data_format flag (given the model is >2GiB), I am meant to be storing the gpt2 weights in transformer.wte.weight, etc., but I cannot, for the life of me, figure out how to export these files with ORTModelForCausalLM.save_pretrained.

Should I be pickling model.transformer.wte.weight, model.transformer.wpe.weight, etc. manually? I’ve tried loading the model with AutoModel and dumping the required files but no dice; mainly because I can’t dump model weights like transformer.h.0.ln_1.weight.

Any help would be much appreciated.

Here you go, I’ve prepared a gist here: Converting gpt2-large to onnx with multiple external files and using it later for inference · GitHub

One script will create a folder with the exported .onnx and all of its depended external files as well as a copy of the tokenizer. The other script will load the local onnx folder and use it for inference.

I’d be happy to know if you manage to quantize it and/or manage to store it in the hub in a way which doesn’t break the model page :slight_smile:

Update: You might also want to check out this repo: GitHub - ELS-RD/transformer-deploy: Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀


Thank you kindly for both the gist and the repo! It had really been nagging me. Happy to report back that all is well and works as expected. Cheers.

1 Like