Why is using my DistilBERT model for inference so slow?

I am using DistilBERT for sequence classification on a dataset of books. Since books will naturally overflow the limit of 512 tokens, I generate a new dataset that splits the books into chunks (preserving the labels from the original books). I fine-tuned a DistilBERT model with this dataset and now I want to extract the average pooled_output embedding for each book.

The way I am doing this is as follows:

def getAveragePooledOutputs(model, encoded_dataset):

  book_embeddings_dataset = {'meaned_pooled_output': [], 'book_title': [], 'genre': [], 'labels': []}
  # array telling us the starts of each book in the dataset
  book_changes = get_book_changes_idx(encoded_dataset['book_title']) 

  for i in range(len(book_changes)):
    start = book_changes[i]
    end = None
    if i != len(book_changes):
      end = book_changes[i+1]
      end = len(encoded_dataset['input_ids'])

    input_ids = th.LongTensor(encoded_dataset['input_ids'][start:end])
    attention_mask = th.BoolTensor(encoded_dataset['attention_mask'][start:end])

    with torch.no_grad():
      embeddings = model.distilbert(input_ids=input_ids, attention_mask=attention_mask, output_hidden_states=True)[0][:,0] # Pooled output
      book_embeddings = th.mean(embeddings, dim=0) # Takes the mean of the pooled output

This seems to work except for the fact that it is painfully slow. Much slower than it is to train the model for example. Why is it so slow and is there any way that I can speed it up?