For many models, like GPT2, the generate
function accepts bad_words_ids
. We’re currently passing about 2500 tokenized phrases into this, and finding that it works well, but also finding that it slows down inference considerably… with 2500 phrases, we find that a generate
that would take 250ms
without the bad_words_ids
, takes about 500ms
.
Maybe there is just no solution for this, and we need to simply curtail our usage of bad_words_ids
?
We were also looking at this code:
A the bad_words_ids
are required to be passed in as a list
. If somehow we could use a tensor
instead, and put that tensor on the GPU, would that speed it up?
Any other suggestions?
thanks