Regarding Quantizing gpt2-xl, gpt2-large, &c

srhm · August 9, 2022, 5:48pm

Hello,

I have spent some time now trying to quantize gpt2-large and am having trouble. I’m following the example provided here. The process fails when running quantizer.export The specific error is:

FileNotFoundError: [Errno 2] No such file or directory: '.../transformer.wte.weight'

The error is self-explanatory. Judging by this discussion and the use_external_data_format flag (given the model is >2GiB), I am meant to be storing the gpt2 weights in transformer.wte.weight, etc., but I cannot, for the life of me, figure out how to export these files with ORTModelForCausalLM.save_pretrained.

Should I be pickling model.transformer.wte.weight, model.transformer.wpe.weight, etc. manually? I’ve tried loading the model with AutoModel and dumping the required files but no dice; mainly because I can’t dump model weights like transformer.h.0.ln_1.weight.

Any help would be much appreciated.

Norod78 · August 10, 2022, 8:02am

Here you go, I’ve prepared a gist here: Converting gpt2-large to onnx with multiple external files and using it later for inference · GitHub

One script will create a folder with the exported .onnx and all of its depended external files as well as a copy of the tokenizer. The other script will load the local onnx folder and use it for inference.

I’d be happy to know if you manage to quantize it and/or manage to store it in the hub in a way which doesn’t break the model page

Update: You might also want to check out this repo: GitHub - ELS-RD/transformer-deploy: Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀

srhm · August 10, 2022, 9:52pm

Thank you kindly for both the gist and the repo! It had really been nagging me. Happy to report back that all is well and works as expected. Cheers.

Topic		Replies	Views
Gpt2 inference with onnx and quantize Beginners	6	3861	February 3, 2021
Load pytorch trained model via optimum 🤗Optimum	5	2819	August 10, 2022
Fail: [ONNXRuntimeError] : 1 : FAIL : Deserialize tensor onnx: 🤗Optimum	4	4883	December 7, 2022
Onnx export functionality failure for facebook/opt-2.7b with optimum CLI 🤗Transformers	0	337	October 11, 2023
Static quantization of gpt2-style models with ORTQuantizer 🤗Optimum	3	895	September 18, 2023

Regarding Quantizing gpt2-xl, gpt2-large, &c

Related topics