Gpt2 inference with onnx and quantize

FYI there’s a nice section in the docs that explains the various text generation strategies and how they’re implemented: Utilities for Generation — transformers 4.2.0 documentation

1 Like