ValueError: Unable to create tensor, you should probably activate truncation... but only for training on multiple GPUs or with multi-batch

kkladny2 · November 7, 2024, 2:36pm

I am training a causal language model (Llama2) using the standard Trainer for handling multiple GPUs (no accelerate or torchrun). When I train on a single GPU only with batch size 1, everything works fine. However, when I have more than a single GPU or more than one example in the batch, I get the following error:

ValueError: Unable to create tensor, you should probably activate truncation and/or padding with ‘padding=True’ ‘truncation=True’ to have batched tensors with the same length. Perhaps your features (labels in this case) have excessive nesting (inputs type list where type int is expected).

It doesn’t seem like this error should have anything to do with training on multiple GPUs or multi-batch, but apparently it does. Here is my preprocessing function:

def preprocess_func(batch, tokenizer, max_source_length=512, max_target_length=128):
    inputs = []
    labels = []
    articles = batch["article"]
    summaries = batch["highlights"]

    for article, summary in zip(articles, summaries):
        input_text = article + "\nSummary: "
        target_text = summary + tokenizer.eos_token

        input_ids = tokenizer.encode(input_text, max_length=max_source_length, truncation=True)
        target_ids = tokenizer.encode(target_text, max_length=max_target_length, truncation=True)

        # Combine inputs and targets
        input_ids_combined = input_ids + target_ids

        # Create labels (no prediction needed for the input tokens, so set to -100)
        labels_combined = [-100] * len(input_ids) + target_ids

        inputs.append(input_ids_combined)
        labels.append(labels_combined)

    return {
        'input_ids': inputs,
        'labels': labels
    }

Data collator and trainer are called as follows:

# Data collator
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer,
        mlm=False,
    )

    # Trainer
    trainer = Trainer(
        model=model,
        args=args,
        train_dataset=tokenized_dataset["train"],
        eval_dataset=tokenized_dataset["validation"],
        data_collator=data_collator,
        compute_metrics=compute_metrics,
    )

The description in Data Collator states

Inputs are dynamically padded to the maximum length of a batch if they are not all of the same length.

so unequal lengths of examples in a batch should not be an issue.

Thanks a lot!

John6666 · November 8, 2024, 12:29am

The cause is different from yours, but it seems to be a well-known bug. It seems that in some cases, it can be avoided by updating numpy.

pip install numpy<2

github.com/huggingface/transformers

ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.

opened 05:32PM - 03 Feb 22 UTC

closed 10:48AM - 04 Feb 22 UTC

anum94

Maybe @SaulLu can help? ## Information I am following the [text summariza…tion](https://huggingface.co/course/chapter7/5) tutorial on hugging face website which uses the mt5-small model. It explains step by step on how to perform a text summarization task. ## To reproduce Steps to reproduce the behavior: 1. Run the [following notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/chapter7/section5_pt.ipynb) 2. cell # 32 should reproduce the following error. (it did for me) ``` ValueError Traceback (most recent call last) File ~/PycharmProjects/nlp-env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py:707, in BatchEncoding.convert_to_tensors(self, tensor_type, prepend_batch_axis) 706 if not is_tensor(value): --> 707 tensor = as_tensor(value) 709 # Removing this for now in favor of controlling the shape with `prepend_batch_axis` 710 # # at-least2d 711 # if tensor.ndim > 2: 712 # tensor = tensor.squeeze(0) 713 # elif tensor.ndim < 2: 714 # tensor = tensor[None, :] ValueError: too many dimensions 'str' During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) Input In [72], in <module> ----> 1 data_collator(features) File ~/PycharmProjects/nlp-env/lib/python3.9/site-packages/transformers/data/data_collator.py:586, in DataCollatorForSeq2Seq.__call__(self, features, return_tensors) 583 else: 584 feature["labels"] = np.concatenate([remainder, feature["labels"]]).astype(np.int64) --> 586 features = self.tokenizer.pad( 587 features, 588 padding=self.padding, 589 max_length=self.max_length, 590 pad_to_multiple_of=self.pad_to_multiple_of, 591 return_tensors=return_tensors, 592 ) 594 # prepare decoder_input_ids 595 if ( 596 labels is not None 597 and self.model is not None 598 and hasattr(self.model, "prepare_decoder_input_ids_from_labels") 599 ): File ~/PycharmProjects/nlp-env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py:2842, in PreTrainedTokenizerBase.pad(self, encoded_inputs, padding, max_length, pad_to_multiple_of, return_attention_mask, return_tensors, verbose) 2839 batch_outputs[key] = [] 2840 batch_outputs[key].append(value) -> 2842 return BatchEncoding(batch_outputs, tensor_type=return_tensors) File ~/PycharmProjects/nlp-env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py:212, in BatchEncoding.__init__(self, data, encoding, tensor_type, prepend_batch_axis, n_sequences) 208 n_sequences = encoding[0].n_sequences 210 self._n_sequences = n_sequences --> 212 self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis) File ~/PycharmProjects/nlp-env/lib/python3.9/site-packages/transformers/tokenization_utils_base.py:723, in BatchEncoding.convert_to_tensors(self, tensor_type, prepend_batch_axis) 718 if key == "overflowing_tokens": 719 raise ValueError( 720 "Unable to create tensor returning overflowing tokens of different lengths. " 721 "Please see if a fast version of this tokenizer is available to have this feature available." 722 ) --> 723 raise ValueError( 724 "Unable to create tensor, you should probably activate truncation and/or padding " 725 "with 'padding=True' 'truncation=True' to have batched tensors with the same length." 726 ) 728 return self ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. ```

kkladny2 · November 8, 2024, 8:55am

Thank you for the answer! However, downgrading numpy to versions <2 does not fix the problem. I also tried simply padding the data myself to a fixed length in the preprocessing function, which indeed “fixes” (it is not actually a fix because I would prefer dynamic padding) training multi-batch on a single GPU. For multiple GPUs, however, there is a new error:

RuntimeError: chunk expects at least a 1-dimensional tensor

I could well imagine that it has something to do with package/python versions. I am running on python 3.12.7. Could that be the issue?

John6666 · November 8, 2024, 9:19am

There are certainly a lot of libraries that assume Python version 3.10, but I don’t think that’s the cause this time. Newer versions are less likely to produce errors than older ones…
I thought the following problem was suspicious.

Here is what you can do:

The batch size should be an integer multiple of the number of GPUs. OR

Check if you are passing a scalar value in the arguments which is of 0 shape. You can reshape that tensor to size 1 using .reshape(1)

Topic		Replies	Views
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length 🤗Transformers	4	36828	January 13, 2025
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with ‘padding=True’ ‘truncation=True’ 🤗Transformers	1	816	November 22, 2023
ValueError in using DataCollator: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length 🤗Transformers	1	7606	January 26, 2023
Error when following Transformers Language modeling tutorial step by step Beginners	1	2087	July 28, 2022
Trainer.train() padding error but it was working before 🤗Transformers	0	479	May 7, 2021

ValueError: Unable to create tensor, you should probably activate truncation... but only for training on multiple GPUs or with multi-batch

Related topics