Bert NextSentence memory leak

DLiebman · May 28, 2021, 5:16pm

I am using bert for next-sentence prediction on a cpu. I want to call the model twice in a row to select a sentence from a list of sentence pairs. I am using a batch size of 128. My code looks something like this:

def bert_batch_compare(self, prompt1, prompt2):
    encoding = self.tokenizer(prompt1, prompt2, return_tensors='pt', padding=True, truncation=True, add_special_tokens=True)
    target = torch.ones((1,len(prompt1)), dtype=torch.long)
    outputs = self.model(**encoding, next_sentence_label=target)
    logits = outputs.logits.detach()
    return logits
    
def call_bert(self):
    batch_pattern = []
    batch_template = []
    batch_input = []
    ## make batch_pattern, batch_input, patch_template here!
    si = self.bert_batch_compare(batch_pattern, batch_input)
    ## second call to this fn is killed because of memory limitations    
    sj = self.bert_batch_compare(batch_input, batch_template)

If I lower the batch size to something like 24 it runs, but I’d like to use a larger batch size. I am not doing any training right now. I’m using ‘bert-base-uncased’. During the second call to ‘bert_batch_compare()’ the memory usage increases to 100% and the program crashes. I have 16G to work with. Until that time the code only uses 1.8Gig. I am using linux and python 3.6, along with pytorch 1.8.

BramVanroy · May 28, 2021, 6:37pm

Might not be a memory leak but a case of larger batch padding.

If the longest sequence in the first batch is 80 tokens, then that batch will (likely) be padded to 80 items and that may fit into memory. But if the longest sequence in the next batch then contains 250 tokens, then the whole batch is padded to 250 and that might not fit into memory. So verify the length of each individual sample to be sure.

DLiebman · May 28, 2021, 7:11pm

Before the tokenizer I limit the strings in the batches to 80 characters. That works, or seems to, but I have one question. is 80 characters the limit, or is that determined by experiment? Am I right limiting the string length, or should I be trying to limit the number of tokens?

BramVanroy · May 29, 2021, 7:42am

80 was just an example. You should consider your dataset and find out how long your sentence are and where the information typically is. If you have very long sentences but the most information is at the end you need to to truncate the front, etc. Input data analysis is key. If you then have determined what the best length is, you can use that information in your calls to the tokenizer with max_length.

self.tokenizer(prompt1, prompt2, return_tensors='pt', padding=True, truncation=True, max_length=128)

Note that this max length is about the total amount of splitting subword units, not about the number of total words.

You should first determine this max length according to your data, and then adjust your batch size accordingly.

DLiebman · May 29, 2021, 12:50pm

thank you. that’s very clear.

Topic		Replies	Views
BertForNextSentencePrediction with larger batch size Beginners	2	508	May 18, 2021
Using Batch Encodings 🤗Transformers	0	694	July 12, 2022
OutOfMemoryError: CUDA out of memory Beginners	0	933	June 30, 2023
Reduce inference time with batches Beginners	0	416	September 14, 2021
Tokenizer.batch_encode_plus uses all my RAM Beginners	5	2772	November 23, 2021

Bert NextSentence memory leak

Related topics