Hi there! I am trying to work with a corpus I built which contains batch encodings of size 256 (I passed a list of 256 sentences to a fast BERT tokenizer at a time, and pickled the outputs).
Unfortunately, when I try to pass this entire BatchEncoding object into my model to get prediction logits, the GPU runs out of memory.
I am now trying to “unpack” the BatchEncoding objects into single inputs that can be fed to the model one at a time, but I cannot figure out how to do this. Does anyone know how I can accomplish this?
Alternatively, does anyone know how much memory one of these BatchEncodings might be taking up? I am not sure it is reasonable for 256 BERT encodings to eat up 12 GB of GPU RAM.