GPT2: many bad_words_ids leading to slow text generation?

For many models, like GPT2, the generate function accepts bad_words_ids. We’re currently passing about 2500 tokenized phrases into this, and finding that it works well, but also finding that it slows down inference considerably… with 2500 phrases, we find that a generate that would take 250ms without the bad_words_ids, takes about 500ms.

Maybe there is just no solution for this, and we need to simply curtail our usage of bad_words_ids?

We were also looking at this code:

A the bad_words_ids are required to be passed in as a list. If somehow we could use a tensor instead, and put that tensor on the GPU, would that speed it up?

Any other suggestions?