Cuda memory error even when passing the no_cuda argument

Hi, at my uni servers we don’t have much GPU capacity, so they end up being in constant use. This means that if I try to run a model when the GPUs are in use by someone else, I will get an out of memory error (CUDA error: out of memory).

Looking through the documentation, I saw that if I pass the argument no_cuda=True to the TrainingArguments for a Trainer, then it won’t be using GPUs anymore, even when they are available. However, once I do trainer.predict(dataset), I still get the out of memory error for CUDA. Is this expected?

This is the code I’m using, for reference:

HF_model_name = "../models/test_results_0"

HF_model = AutoModelForSequenceClassification.from_pretrained(HF_model_name, output_hidden_states=True).to("cpu")
HF_tokenizer = AutoTokenizer.from_pretrained(HF_model_name)

args = TrainingArguments(no_cuda=True,output_dir=".")

trainer = Trainer(

data_files = "../data/fakenews/cleaned_data/text_dataset.tsv"
 df = pd.read_csv(data_files, sep="\t")

 text = df.loc[0,"text"]

def tokenize_data(example):
    return HF_tokenizer(example["text"], padding="max_length")


text_small = text[:200]
tokenized_text = pd.DataFrame([text_small], columns=["text"])
dataset = Dataset.from_dict(tokenized_text)
dataset =, batched=True)

T = trainer.predict(dataset)