Hi, relatively new user of Huggingface here, trying to do multi-label classfication, and basing my code off this example.
I have put my own data into a DatasetDict format as follows:
df2 = df[['text_column', 'answer1', 'answer2']].head(1000)
df2['text_column'] = df2['text_column'].astype(str)
dataset = Dataset.from_pandas(df2)
# train/test/validation split
train_testvalid = dataset.train_test_split(test_size=0.1)
test_valid = train_testvalid["test"].train_test_split(test_size=0.5)
# put into DatasetDict
datasets = DatasetDict({
"train": train_testvalid["train"],
"test": test_valid["test"],
"valid": test_valid["train"]})
Later, I load the model using
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased",
problem_type="multi_label_classification",
num_labels=len(labels),
id2label=id2label,
label2id=label2id).to(device)
and then check that the model is on GPU using next(model.parameters()).is_cuda
; if I comment out .to(device)
, the model is not sent to GPU.
My problem comes when itās time to train the model as follows:
trainer = Trainer(
model,
args,
train_dataset=encoded_dataset["train"].to(device),
eval_dataset=encoded_dataset["valid"].to(device),
tokenizer=tokenizer,
compute_metrics=compute_metrics
)
trainer.train()
which causes the error:
AttributeError: 'Dataset' object has no attribute 'to'
But if I donāt try and send the train and eval datasets to GPU, I get the error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
So my question is: how can I send these datasets to the GPU - where the model already is - in order to efficiently train and validate the model using them?
Thank you!