I am training a causal language model (Llama2) using the standard Trainer for handling multiple GPUs (no accelerate
or torchrun
). When I train on a single GPU only with batch size 1, everything works fine. However, when I have more than a single GPU or more than one example in the batch, I get the following error:
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with âpadding=Trueâ âtruncation=Trueâ to have batched tensors with the same length. Perhaps your features (
labels
in this case) have excessive nesting (inputs typelist
where typeint
is expected).
It doesnât seem like this error should have anything to do with training on multiple GPUs or multi-batch, but apparently it does. Here is my preprocessing function:
def preprocess_func(batch, tokenizer, max_source_length=512, max_target_length=128):
inputs = []
labels = []
articles = batch["article"]
summaries = batch["highlights"]
for article, summary in zip(articles, summaries):
input_text = article + "\nSummary: "
target_text = summary + tokenizer.eos_token
input_ids = tokenizer.encode(input_text, max_length=max_source_length, truncation=True)
target_ids = tokenizer.encode(target_text, max_length=max_target_length, truncation=True)
# Combine inputs and targets
input_ids_combined = input_ids + target_ids
# Create labels (no prediction needed for the input tokens, so set to -100)
labels_combined = [-100] * len(input_ids) + target_ids
inputs.append(input_ids_combined)
labels.append(labels_combined)
return {
'input_ids': inputs,
'labels': labels
}
Data collator and trainer are called as follows:
# Data collator
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False,
)
# Trainer
trainer = Trainer(
model=model,
args=args,
train_dataset=tokenized_dataset["train"],
eval_dataset=tokenized_dataset["validation"],
data_collator=data_collator,
compute_metrics=compute_metrics,
)
The description in Data Collator states
Inputs are dynamically padded to the maximum length of a batch if they are not all of the same length.
so unequal lengths of examples in a batch should not be an issue.
Thanks a lot!