AttributeError: 'str' object has no attribute 'dtype' when pretraining wav2vec2

mpierrau · October 27, 2022, 9:15am

Hi @omar47. I’m not sure we have the same original issue.

I see two alternative issues that may cause this:

Passing the class Wav2Vec2FeatureExtractor to DataCollatorForWav2Vec2Pretraining. Solution: instantiate the feature extractor before passing it to the data collator instance:

    model = Wav2Vec2ForPreTraining(args.model_path)
    feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(args.model_path)

    data_collator = DataCollatorForWav2Vec2Pretraining(
        model=model, 
        feature_extractor=feature_extractor
    )

The dataset is still a dictionary:
This problem appeared again for me because I did not remove unused columns in the preprocessing step (using prepare_dataset). Because of this the data is still in dictionary form, which I think is not expected by the padding function in the Data Collator. Make sure that you keep the line remove_columns=raw_datasets["train"].column_names when mapping the prepare_dataset function to your dataset:

vectorized_datasets = raw_datasets.map(
            prepare_dataset,
            num_proc=args.preprocessing_num_workers,
            remove_columns=raw_datasets["train"].column_names,
            cache_file_names=cache_file_names,
        )

These are the two things that I could identify that alleviated the issue in my case. Good luck!

Topic		Replies	Views
Getting this 'AttributeError: 'list' object has no attribute 'get'' error when trying to fine tune wav2vec2 model 🤗Transformers	0	787	January 31, 2024
TypeError: '<' not supported between instances of 'NoneType' and 'int' while training wav2vec2 🤗Transformers	1	2534	October 27, 2024
Wav2VecForPreTraining - Not able to run trainer.train() Beginners	3	681	October 19, 2021
How to train Wav2Vec2 in LoRA? Models	1	1296	November 19, 2023
Trainer's dataloader influenced by target model? 🤗AutoTrain	2	227	March 5, 2024

AttributeError: 'str' object has no attribute 'dtype' when pretraining wav2vec2

Related topics