Hi, I am trying to use deepspeed along with hugginface trainer. For that I simply used the following code. However, this code gives the following error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! . Can someone help me to resolve this issue?
training_args = TrainingArguments(
output_dir=outputdir,
per_device_train_batch_size=script_args.batch_size,
deepspeed= "deepspeed_config.json",
bf16=True,
bf16_full_eval=True,
gradient_accumulation_steps=script_args.gradient_accumulation_steps,
learning_rate=script_args.learning_rate,
logging_steps=script_args.logging_steps,
num_train_epochs=script_args.num_train_epochs,
max_steps=script_args.max_steps,
report_to=script_args.log_with,
save_steps=script_args.save_steps,
save_total_limit=script_args.save_total_limit,
push_to_hub=script_args.push_to_hub,
hub_model_id=script_args.hub_model_id,
)
trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets['eval'],
tokenizer=tokenizer,
)