Why BERT is not in the TGI?

I want to run the BERT model. Why is it not supported in the text-generation-inference (Supported Models and Hardware)?


BERT is an encoder-only Transformer, useful for discriminative tasks (classificiation). TGI is meant for decoder-only Transformers, useful for generative tasks (generating tokens one at a time).

If you want to optimize BERT for production, I’d recommend taking a look at ONNX available in the Optimum library: Convert Transformers to ONNX with Hugging Face Optimum.