Trainer() issue: AttributeError:` 'str' object `has no attribute 'dtype'

fkov · June 11, 2023, 1:42pm

System Info

transformers version: 4.30.0.dev0
Platform: Linux-5.4.204-ql-generic-12.0-19-x86_64-with-glibc2.31
Python version: 3.11.3
Huggingface_hub version: 0.14.1
Safetensors version: 0.3.1
PyTorch version (GPU?): 2.0.1 (True)
Versions of relevant libraries:
[pip3] numpy==1.23.0
[pip3] torch==2.0.1
[pip3] torchaudio==2.0.2
[pip3] torchvision==0.15.2
[conda] blas 1.0 mkl
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2023.1.0 h6d00ec8_46342
[conda] mkl-service 2.4.0 py311h5eee18b_1
[conda] mkl_fft 1.3.6 py311ha02d727_1
[conda] mkl_random 1.2.2 py311ha02d727_1
[conda] numpy 1.23.0 pypi_0 pypi
[conda] pytorch 2.0.1 py3.11_cuda11.8_cudnn8.7.0_0 pytorch
[conda] pytorch-cuda 11.8 h7e8668a_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 2.0.2 py311_cu118 pytorch
[conda] torchtriton 2.0.0 py311 pytorch
[conda] torchvision 0.15.2 py311_cu118 pytorch

Who can help?

@sgugger @sanchit-gandhi

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

ERROR:

train_result = trainer.train(resume_from_checkpoint=checkpoint)
…
python3.11/site-packages/transformers/feature_extraction_sequence_utils.py", line 220, in pad
if value.dtype is np.dtype(np.float64):
^^^^^^^^^^^
AttributeError:'str' objecthas no attribute ‘dtype’

I am not sure which element of the dataset is read as ‘str’

1. OFFICIAL SCRIPT: transformers/examples/pytorch/audio-classification/run_audio_classification.py

2. LOADED DATASET:

DatasetDict({
train: Dataset({
features: ['audio', 'label'],
num_rows: 1280
})
validation: Dataset({
features: [‘audio’, ‘label’],
num_rows: 160
})
test: Dataset({
features: [‘audio’, ‘label’],
num_rows: 160
})

3. logger.info(raw_datasets[‘train’][0])

{‘audio’: {'path': ‘/transformers/examples/pytorch/audio-classification/s/data/s/s/train/audio1.wav’, 'array': array([0.02072144, 0.02767944, 0.03274536, …, 0.00079346, 0.00088501,
0.00149536]), ‘sampling_rate’: 16000}, 'label': ‘happy’}

@mariosasko any idea about this?

Expected behavior

load the dataset to model for training in train_result = trainer.train(resume_from_checkpoint=checkpoint)

Topic		Replies	Views
Trainner API is not working. Its complaining of numpy depreciation issues 🤗Transformers	0	138	April 11, 2024
Unable to load the transformer.trainer 🤗Transformers	0	1483	August 23, 2023
AttributeError: 'NoneType' object has no attribute 'dtype' Intermediate	8	24338	January 17, 2023
Failed to import transformers.trainer 🤗Transformers	0	3402	September 8, 2023
Error: RuntimeError: Could not infer dtype of DatasetInfo 🤗Transformers	0	554	January 24, 2024