Trainer() issue: AttributeError:` 'str' object `has no attribute 'dtype'

System Info

  • transformers version: 4.30.0.dev0
  • Platform: Linux-5.4.204-ql-generic-12.0-19-x86_64-with-glibc2.31
  • Python version: 3.11.3
  • Huggingface_hub version: 0.14.1
  • Safetensors version: 0.3.1
  • PyTorch version (GPU?): 2.0.1 (True)
    Versions of relevant libraries:
    [pip3] numpy==1.23.0
    [pip3] torch==2.0.1
    [pip3] torchaudio==2.0.2
    [pip3] torchvision==0.15.2
    [conda] blas 1.0 mkl
    [conda] ffmpeg 4.3 hf484d3e_0 pytorch
    [conda] mkl 2023.1.0 h6d00ec8_46342
    [conda] mkl-service 2.4.0 py311h5eee18b_1
    [conda] mkl_fft 1.3.6 py311ha02d727_1
    [conda] mkl_random 1.2.2 py311ha02d727_1
    [conda] numpy 1.23.0 pypi_0 pypi
    [conda] pytorch 2.0.1 py3.11_cuda11.8_cudnn8.7.0_0 pytorch
    [conda] pytorch-cuda 11.8 h7e8668a_5 pytorch
    [conda] pytorch-mutex 1.0 cuda pytorch
    [conda] torchaudio 2.0.2 py311_cu118 pytorch
    [conda] torchtriton 2.0.0 py311 pytorch
    [conda] torchvision 0.15.2 py311_cu118 pytorch

Who can help?

@sgugger @sanchit-gandhi

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

ERROR:

train_result = trainer.train(resume_from_checkpoint=checkpoint)
…
python3.11/site-packages/transformers/feature_extraction_sequence_utils.py", line 220, in pad
if value.dtype is np.dtype(np.float64):
^^^^^^^^^^^
AttributeError:'str' objecthas no attribute ‘dtype’

I am not sure which element of the dataset is read as ‘str’

1. OFFICIAL SCRIPT: transformers/examples/pytorch/audio-classification/run_audio_classification.py

2. LOADED DATASET:

DatasetDict({
train: Dataset({
features: ['audio', 'label'],
num_rows: 1280
})
validation: Dataset({
features: [‘audio’, ‘label’],
num_rows: 160
})
test: Dataset({
features: [‘audio’, ‘label’],
num_rows: 160
})

3. logger.info(raw_datasets[‘train’][0])

{‘audio’: {'path': ‘/transformers/examples/pytorch/audio-classification/s/data/s/s/train/audio1.wav’, 'array': array([0.02072144, 0.02767944, 0.03274536, …, 0.00079346, 0.00088501,
0.00149536]), ‘sampling_rate’: 16000}, 'label': ‘happy’}

@mariosasko any idea about this?

Expected behavior

load the dataset to model for training in train_result = trainer.train(resume_from_checkpoint=checkpoint)