ValueError: Unable to create tensor, you should probably activate truncation... but only for training on multiple GPUs or with multi-batch

I am training a causal language model (Llama2) using the standard Trainer for handling multiple GPUs (no accelerate or torchrun). When I train on a single GPU only with batch size 1, everything works fine. However, when I have more than a single GPU or more than one example in the batch, I get the following error:

ValueError: Unable to create tensor, you should probably activate truncation and/or padding with ‘padding=True’ ‘truncation=True’ to have batched tensors with the same length. Perhaps your features (labels in this case) have excessive nesting (inputs type list where type int is expected).

It doesn’t seem like this error should have anything to do with training on multiple GPUs or multi-batch, but apparently it does. Here is my preprocessing function:

def preprocess_func(batch, tokenizer, max_source_length=512, max_target_length=128):
    inputs = []
    labels = []
    articles = batch["article"]
    summaries = batch["highlights"]

    for article, summary in zip(articles, summaries):
        input_text = article + "\nSummary: "
        target_text = summary + tokenizer.eos_token

        input_ids = tokenizer.encode(input_text, max_length=max_source_length, truncation=True)
        target_ids = tokenizer.encode(target_text, max_length=max_target_length, truncation=True)

        # Combine inputs and targets
        input_ids_combined = input_ids + target_ids

        # Create labels (no prediction needed for the input tokens, so set to -100)
        labels_combined = [-100] * len(input_ids) + target_ids

        inputs.append(input_ids_combined)
        labels.append(labels_combined)

    return {
        'input_ids': inputs,
        'labels': labels
    }

Data collator and trainer are called as follows:

# Data collator
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer,
        mlm=False,
    )

    # Trainer
    trainer = Trainer(
        model=model,
        args=args,
        train_dataset=tokenized_dataset["train"],
        eval_dataset=tokenized_dataset["validation"],
        data_collator=data_collator,
        compute_metrics=compute_metrics,
    )

The description in Data Collator states

Inputs are dynamically padded to the maximum length of a batch if they are not all of the same length.

so unequal lengths of examples in a batch should not be an issue.

Thanks a lot!

1 Like

The cause is different from yours, but it seems to be a well-known bug. It seems that in some cases, it can be avoided by updating numpy.

pip install numpy<2

Thank you for the answer! However, downgrading numpy to versions <2 does not fix the problem. I also tried simply padding the data myself to a fixed length in the preprocessing function, which indeed “fixes” (it is not actually a fix because I would prefer dynamic padding) training multi-batch on a single GPU. For multiple GPUs, however, there is a new error:

RuntimeError: chunk expects at least a 1-dimensional tensor

I could well imagine that it has something to do with package/python versions. I am running on python 3.12.7. Could that be the issue?

There are certainly a lot of libraries that assume Python version 3.10, but I don’t think that’s the cause this time. Newer versions are less likely to produce errors than older ones…
I thought the following problem was suspicious.

Here is what you can do:

  1. The batch size should be an integer multiple of the number of GPUs. OR
  2. Check if you are passing a scalar value in the arguments which is of 0 shape. You can reshape that tensor to size 1 using .reshape(1)