I am using a pretrained Bert (
TFBertModel) in transformers to encode several batches of sentences, with varying batch size. That is, I need to use Bert to encode a series of inputs, where each input has
[n_sentences, 512] dimensionality (where 512 is the number of tokens).
N_sentences can vary between 2 and 250 across inputs/examples.
This is proving very time consuming: encoding each input/example takes several seconds, especially for larger values of
Is there a(n easy) way to parallelize the
model(input) call (where, again, input has dimensionality
[n_sentences, 512]) in Google Colab’s TPU (or on GPUs), such that more than one sentence is encoded at once?