Why is using my DistilBERT model for inference so slow?

Gianluca · June 18, 2021, 7:59pm

I am using DistilBERT for sequence classification on a dataset of books. Since books will naturally overflow the limit of 512 tokens, I generate a new dataset that splits the books into chunks (preserving the labels from the original books). I fine-tuned a DistilBERT model with this dataset and now I want to extract the average pooled_output embedding for each book.

The way I am doing this is as follows:

def getAveragePooledOutputs(model, encoded_dataset):

  book_embeddings_dataset = {'meaned_pooled_output': [], 'book_title': [], 'genre': [], 'labels': []}
  # array telling us the starts of each book in the dataset
  book_changes = get_book_changes_idx(encoded_dataset['book_title']) 

  for i in range(len(book_changes)):
    start = book_changes[i]
    end = None
    if i != len(book_changes):
      end = book_changes[i+1]
    else:
      end = len(encoded_dataset['input_ids'])

    input_ids = th.LongTensor(encoded_dataset['input_ids'][start:end])
    attention_mask = th.BoolTensor(encoded_dataset['attention_mask'][start:end])

    with torch.no_grad():
      embeddings = model.distilbert(input_ids=input_ids, attention_mask=attention_mask, output_hidden_states=True)[0][:,0] # Pooled output
      book_embeddings = th.mean(embeddings, dim=0) # Takes the mean of the pooled output
      ...

This seems to work except for the fact that it is painfully slow. Much slower than it is to train the model for example. Why is it so slow and is there any way that I can speed it up?

Topic		Replies	Views
Pool [CLS] token from DistilBERT 🤗Transformers	1	800	January 18, 2022
How to optimise transformer speed for batches of inputs? 🤗Transformers	0	261	June 7, 2021
How to properly compute Sentence Embeddings using a non english, pretrained distilbert model? Beginners	0	521	April 25, 2021
DistilBERT and CLS token Beginners	2	2468	February 21, 2021
Make bert inference faster 🤗Transformers	6	11063	September 16, 2021

Why is using my DistilBERT model for inference so slow?

Related topics