Reduce inference time with batches

otatopeht · September 14, 2021, 3:00pm

Hi everyone.

I want to further improve the inference time from BERT.
Here is the code below:

for sentence in list(data_dict.values()):
    tokens = {'input_ids': [], 'attention_mask': []}
    new_tokens = tokenizer.encode_plus(sentence, max_length=512,
                                       truncation=True, padding='max_length',
                                       return_tensors='pt',
                                       return_attention_mask=True)
    tokens['input_ids'].append(new_tokens['input_ids'][0])
    tokens['attention_mask'].append(new_tokens['attention_mask'][0])

    # reformat list of tensors into single tensor
    tokens['input_ids'] = torch.stack(tokens['input_ids'])
    tokens['attention_mask'] = torch.stack(tokens['attention_mask'])

    outputs = model(**tokens)
    embeddings = outputs[0]

How can I provide batches like in training instead of the whole dataset?

Topic		Replies	Views
Make bert inference faster 🤗Transformers	6	10826	September 16, 2021
Batched BertForMaskedLM inference loss issue Intermediate	0	690	February 23, 2022
How to ensure fast inference on both CPU and GPU with BertForSequenceClassification? Beginners	5	5835	November 3, 2021
Is it possible to make the first batch as fast as the subsequent ones? 🤗Optimum	1	86	June 25, 2024
BertForNextSentencePrediction with larger batch size Beginners	2	508	May 18, 2021

Reduce inference time with batches

Related topics