I am using a pretrained model with processor=Wav2Vec2ProcessorWithLM(...)
and model=Wav2Vec2ForCTC(...)
to transcribe some files. I am running the files in batches with zero padding. After trying to solve out of memory issues for a while, I tried to run a batch that I know fits in memory two times. The first time the batch runs fine, but the second time it gives out of memory. Because I am freeing the variables and clearing the cache, I can only assume that part of the memory for internal data structures is not released by the model. Using torch.cuda.memory_summary()
I can see that the initial reserved memory is 1206 MB, after running the first batch it is 9432 MB, but after cleaning up it is 3938 MB instead of going back to 1206 MB. Any suggestions?
Here is a snippet on how I run it:
with torch.no_grad():
input_values = self.processor(
batch_data, sampling_rate=sampling_rate, return_tensors="pt", padding=True
).input_values.to(self.device)
self.model.eval()
logits = self.model(input_values).logits.cpu()
transcriptions = self.processor.batch_decode(logits.detach().numpy()).text
del batch_data
del input_values
del logits
del transcriptions
gc.collect()
torch.cuda.empty_cache()
I tried with and without self.model.eval()
.
I am running with torch version 1.11.0+cu113.
The machine runs Ubuntu 20.04 with a GeForce RTX 3090 card (24265MiB ram), and I am the only one running there. Let me know if you need more information.