Wav2Vec2 - ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length

lnxdx · November 22, 2023, 9:32pm

When running the following code I get error. What is the problem?

feat_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/wav2vec2-base-960h")
fake_batch = [{'input_values' : torch.rand([1, 3])}, {'input_values' : torch.rand([1, 4])}]
feat_extractor.pad(fake_batch, padding=True)

ValueError                                Traceback (most recent call last)
File C:\Program Files\Python310\lib\site-packages\transformers\feature_extraction_utils.py:175, in BatchFeature.convert_to_tensors(self, tensor_type)
    174 if not is_tensor(value):
--> 175     tensor = as_tensor(value)
    177     self[key] = tensor

File C:\Program Files\Python310\lib\site-packages\transformers\feature_extraction_utils.py:148, in BatchFeature.convert_to_tensors.<locals>.as_tensor(value)
    147 if isinstance(value, (list, tuple)) and len(value) > 0 and isinstance(value[0], np.ndarray):
--> 148     value = np.array(value)
    149 return torch.tensor(value)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 1) + inhomogeneous part.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In [126], line 3
      1 fake_batch = [{'input_values' : torch.rand([1, 112000])}, {'input_values' : torch.rand([1, 86000])}]
      2 fake_batch = [{'input_values' : torch.rand([1, 3])}, {'input_values' : torch.rand([1, 4])}]
----> 3 feat_extractor.pad(fake_batch, padding=True, return_tensors = 'pt')

File C:\Program Files\Python310\lib\site-packages\transformers\feature_extraction_sequence_utils.py:224, in SequenceFeatureExtractor.pad(self, processed_features, padding, max_length, truncation, pad_to_multiple_of, return_attention_mask, return_tensors)
    221             value = value.astype(np.float32)
    222         batch_outputs[key].append(value)
--> 224 return BatchFeature(batch_outputs, tensor_type=return_tensors)

File C:\Program Files\Python310\lib\site-packages\transformers\feature_extraction_utils.py:78, in BatchFeature.__init__(self, data, tensor_type)
     76 def __init__(self, data: Optional[Dict[str, Any]] = None, tensor_type: Union[None, str, TensorType] = None):
     77     super().__init__(data)
---> 78     self.convert_to_tensors(tensor_type=tensor_type)

File C:\Program Files\Python310\lib\site-packages\transformers\feature_extraction_utils.py:181, in BatchFeature.convert_to_tensors(self, tensor_type)
    179         if key == "overflowing_values":
    180             raise ValueError("Unable to create tensor returning overflowing values of different lengths. ")
--> 181         raise ValueError(
    182             "Unable to create tensor, you should probably activate padding "
    183             "with 'padding=True' to have batched tensors with the same length."
    184         )
    186 return self

ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.

My system configuration:
Windows 11
Python 3.11.5
torch 2.1.1+cu121
transformers 4.31.0
numpy 1.24.3

lnxdx · November 27, 2023, 11:10am

input_values passed to feat_extractor should be one-dimensional. I tried fake_batch = [{'input_values' : torch.rand(3)}, {'input_values' : torch.rand(4)}] and it worked.

Topic		Replies	Views
Dynamic Padding not working for Custom Dataset 🤗Datasets	5	3998	February 9, 2022
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length 🤗Transformers	4	36686	January 13, 2025
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with ‘padding=True’ ‘truncation=True’ 🤗Transformers	1	811	November 22, 2023
Processor :: pad Ignores Padding? Beginners	1	768	November 22, 2023
Batch input for wav2vec2 pretraining Beginners	1	370	July 15, 2021

Wav2Vec2 - ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length

Related topics