Well it doesn’t seem like I can edit this original post anymore but after testing 4 additional variants here is a version that I believe is working (will be another 10 hours before it finishes and I can confirm):
# device = 0 puts the pipeline on GPU, otherwise it will only use CPU.
pipe = pipeline("text-classification", model = model, tokenizer = tokenizer, device = 0)
# Hide the large number of deprecation warnings.
warnings.filterwarnings("ignore", category = DeprecationWarning)
preds = []
# GPU RAM usage continues to grow through inference :( Something is not being deleted correctly.
for i, outputs in enumerate(tqdm(pipe(KeyDataset(raw_dataset, "text"), batch_size = 128),
total = len(raw_dataset))):
preds.append(outputs)
I haven’t been able to get a version working using the pretokenized version of the dataset. It’s also unfortunate that GPU RAM usage grows over time, because I have to make sure the batch size keeps enough RAM available to not fail due to out-of-memory near the end of the loop. I don’t know if this could be because outputs needs to be explicitly deleted from GPU RAM or something.