Question about Wav2vec2

hey @sgugger

I wanted to know what’s up with Wav2vec2 facebook/wav2vec2-base-960h · Hugging Face and
facebook/wav2vec2-base · Hugging Face I was looking into the model files and saw that a padding value had been specified but when I use the preprocessor it seems like it totally ignores the instructions in the preprocessor config and passing values to the processor call does not help either. I want to know where does the processor get the padding value if not from the processor config

My code :

 from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
 from datasets import load_dataset
 import soundfile as sf
 import torch
 
# load model and tokenizer
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base")

input_values3 =  processor(x, return_tensors="pt", padding="longest").input_values  # Batch size 1

input_values3

Also, another question does each Wav2vec2 model have its own way of normalizing/processing audio because I passed the same audio to jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn · Hugging Face

and it gave me different results,my code :

 from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
 from datasets import load_dataset
 import soundfile as sf
 import torch
 
# load model and tokenizer
processor = Wav2Vec2Processor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn")

input_values =  processor(x, return_tensors="pt", padding="longest").input_values  # Batch size 1

input_values

I was playing around with Wav2vec2 a bit more and there were a few more things I noticed that were off, firstly in the Wav2vec2Procesor whenever we pad, we then apply the normalization which in turn changes the padded value per sequence. Would it not make more sense to let the padded value be as is and apply the normalization before the padding? Also, Some parameters like padded_value and do_normalize do not work either. I tried saying do_normalize = False but it still normalized. In the same case with padded_value, I tried passing a few different values but the output from the processor did not change one bit. I think this is because we do not pass the padded_value to the pad function

Shouldn’t we pass the other parameters here as well?

Let me know where I am wrong, I am not very experienced with ML/DL models yet so all of this might be just my misunderstanding