Issue "got multiple values for keyword argument" for Wav2Vec2 tokenizer

There seems to be some issue with using Wav2Vec2Processor to call the tokenizer. The following sample code (minimal code for reproducing issue) is with transformers 4.49.0 on Mac OS X Sequoia.

processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-large-960h")

with processor.as_target_processor():
    processor("HELLO WORLD", return_tensors="pt")

This results in the error

got multiple values for keyword argument 'return_attention_mask'

Which is thrown from transformers/models/wav2vec2/processing_wav2vec2.py line 104. Examining this file, I see (line 103 on):

        if self._in_target_context_manager:
            return self.current_processor(
                audio,
                **output_kwargs["audio_kwargs"],
                **output_kwargs["text_kwargs"],
                **output_kwargs["common_kwargs"],
            )

The issue appears that ā€œreturn_attention_maskā€ and ā€œreturn_tensorsā€ appear multiple times:

  • ā€˜return_attention_mask’ appears in output_kwargs[ā€œtext_kwargsā€] and output_kwargs[ā€œaudio_kwargsā€]
  • ā€˜return_tensors’ appears in all 4 *_kwargs

Dropping into the debugger, if I only null out output_kwargs[ā€œaudio_kwargsā€], it will fix the ā€œmultiple values for keyword argument ā€˜return_attention_maskā€™ā€ error but the code will throw a new error that it got multiple values for ā€˜return_tensors’. Nulling out both audio and common kwargs will completely remove the errors.

Furthermore ā€œas_target_processor()ā€ causes the code throws a warning that it will be deprecated, and asks that I ā€œprocess [my] labels by using the argument text of the regular __call__ methodā€, but if I now remove the as_target_processor() line and instead directly use:

processor(text="HELLO WORLD", return_tensors="pt")

I find:

  1. This does not resolve the multiple values error
  2. If I now bypass the error manually in the debugger by nulling out audio and common kwargs, I get an error later on that the tokenizer never receives the text string, which is true since self.current_processor() only receives the audio

For now, calling the tokenizer directly appears to bypass both issues:

processor.tokenizer("HELLO WORLD", return_tensors="pt")

What am I doing wrong?

1 Like

I was able to reproduce this in my local environment. It seems that something similar happens with other models’ processors in 4.49.0. There were no issues reported regarding this Wav2Vec2 error, so it might be better to raise it just to be sure.