Advice to speed and performance

datistiquo · October 27, 2020, 4:48pm

Hey,

I get the feeling that I might miss something about the perfomance and speed and memory issues using huggingface transformer. Since, I like this repo and huggingface transformers very much (!) I hope I do not miss something as I almost did not use any other Bert Implementations. Because I want to use TF2 that is why I use huggingface.

Now, I would like to speed up inference and maybe decreasing memory usage.

As I am native tensorflow user, I have no experience with the pytorch models at all.

So is it possible that the pytorch models are more performant and more efficient than the tf models?
*How can I speed up inference ? For encoding 200 sentences pairs on my cpu it takes 12 seconds.
So is it more feasible to use pytorch models for making inference or even training?
Are there any memory usage differences?
*So, why is bert-as-a-service more performant and faster (as it looks like) I hope I can test this?

I ask because I stumpled over here:

github.com/huggingface/transformers

Difference between this repo and bert-as-service

opened 06:44PM - 13 Apr 19 UTC

closed 09:14AM - 23 Jun 19 UTC

tcqiuyu

Discussion wontfix

Hi, I wondered if anybody knows the difference between the `BertModel` of th…is repo and [bert-as-service](https://github.com/hanxiao/bert-as-service). 1. I cannot get the same result between these two even if I use the same checkpoint. pytorch-pretrained-BERT yield a lower acc and slower convergence. 2. The memory usage of `BertModel` seems much higher than bert-as-service. With the same batch-size=32, max_seq_len=100, the bert-as-service will take about 8000MB but `BertModel` will cost more than 16000MB because I got an OOM issue. Does any one knows the reason behind it? Is there any optimization done for bert-as-service?

Some advices for better usage (for deployment) are very appreciated.

datistiquo · November 5, 2020, 1:56pm

Is huggingface with pytorch faster than with tensorlfow?

thomwolf · November 12, 2020, 1:15pm

@jplu is currently working on making the TF2 models a lot faster!
The situation should be better soon (still in a few weeks probably).

jplu · November 13, 2020, 9:34am

Hello !

As thomwolf said, we are currently working on a much performant version of the TF models and then for now, yes, the PyTorch models are more optimized than the TF ones.

Bert-as-a-service works faster because it is highly optimized for inference by making:

BERT using mixed precision (thing that we can do as well)
freeze the model
powerful/scalable service API

If you are looking for a performant inference for your TF model I suggest you to take a look at ONNX, we provide a script to create your own optimized ONNX model in the repo. And afterwards you can run a Triton server to provide an over your model.

datistiquo · December 7, 2020, 9:52pm

Hey, thank you. I think I could also use a finetuned TF model (with huggingface) and use it with bert-as-aservice?

Topic		Replies	Views
BERT performs worse than other implementations? 🤗Transformers	0	779	July 24, 2020
Make bert inference faster 🤗Transformers	6	10795	September 16, 2021
BERT model is slow in Pytorch 🤗Transformers	5	626	November 30, 2023
Huggingface using only half of the cores for inference Intermediate	0	518	September 6, 2023
Difference of performance when finetuning bert use the huggingface or the google official code 🤗Transformers	0	446	June 20, 2022

Advice to speed and performance

Related topics