Hi, at my uni servers we don’t have much GPU capacity, so they end up being in constant use. This means that if I try to run a model when the GPUs are in use by someone else, I will get an out of memory error (CUDA error: out of memory
).
Looking through the documentation, I saw that if I pass the argument no_cuda=True
to the TrainingArguments
for a Trainer
, then it won’t be using GPUs anymore, even when they are available. However, once I do trainer.predict(dataset)
, I still get the out of memory error for CUDA. Is this expected?
This is the code I’m using, for reference:
HF_model_name = "../models/test_results_0"
HF_model = AutoModelForSequenceClassification.from_pretrained(HF_model_name, output_hidden_states=True).to("cpu")
HF_tokenizer = AutoTokenizer.from_pretrained(HF_model_name)
args = TrainingArguments(no_cuda=True,output_dir=".")
trainer = Trainer(
model=HF_model,
args=TrainingArguments(args,output_dir="."),
)
data_files = "../data/fakenews/cleaned_data/text_dataset.tsv"
df = pd.read_csv(data_files, sep="\t")
text = df.loc[0,"text"]
def tokenize_data(example):
return HF_tokenizer(example["text"], padding="max_length")
print(len(text))
text_small = text[:200]
tokenized_text = pd.DataFrame([text_small], columns=["text"])
dataset = Dataset.from_dict(tokenized_text)
dataset = dataset.map(tokenize_data, batched=True)
T = trainer.predict(dataset)