Speed up Code relating to Tokenizer and BertForNextSentence Prediction

SecondBrother · January 20, 2022, 7:25pm

Hi I am just using the transformer library for the first time.
My goal is to compare how likely it is that a sentence follows another sentence. For this I consider a list of 100 sentences. I want to calculate the probability for each sentence with every other sentence except itself.
Now it turns out that the following code takes a considerable amount of time to run. Is there any way that I can speed up the following code by, for example, applying the tokenizer/model function to the entire list. I am happy to receive performance hints.

Many thanks,
SecondBrother

def get_similarity_nsp_bert(top_x_results: list[str], comparing_results: list[str], model: BertForNextSentencePrediction, tokenizer: BertTokenizer) -> list[tuple[str, str]]:

    matched_results = {}
    for i, first_sent in enumerate(tqdm(top_x_results)):
        similarities = []
        for j, second_sent in enumerate(comparing_results):
            if i!=j:
                encoding = tokenizer(first_sent, second_sent, return_tensors='pt')
                outputs = model(**encoding, labels=torch.LongTensor([1]))
                similarities.append((j, outputs.logits))
        sorted_similarities = sorted(similarities, key=lambda x: x[1].detach().numpy()[0][0], reverse=True)
        matched_results[i] = sorted_similarities
    return [(top_x_results[i], comparing_results[matched_results[i][0][0]]) for i in matched_results]

Topic		Replies	Views
Make bert inference faster 🤗Transformers	6	10759	September 16, 2021
Sentence Pair Classification Intermediate	1	1991	May 4, 2022
Tokenizer extremely slow when deployed to a container 🤗Tokenizers	0	1287	April 14, 2023
Reduce inference time with batches Beginners	0	414	September 14, 2021
BERT next sentence prediction: bert-base always returns false Models	3	804	April 26, 2023

Speed up Code relating to Tokenizer and BertForNextSentence Prediction

Related topics