hey @sgugger
I wanted to know what’s up with Wav2vec2 facebook/wav2vec2-base-960h · Hugging Face and
facebook/wav2vec2-base · Hugging Face I was looking into the model files and saw that a padding value had been specified but when I use the preprocessor it seems like it totally ignores the instructions in the preprocessor config and passing values to the processor call does not help either. I want to know where does the processor get the padding value if not from the processor config
My code :
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import soundfile as sf
import torch
# load model and tokenizer
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base")
input_values3 = processor(x, return_tensors="pt", padding="longest").input_values # Batch size 1
input_values3
Also, another question does each Wav2vec2 model have its own way of normalizing/processing audio because I passed the same audio to jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn · Hugging Face
and it gave me different results,my code :
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import soundfile as sf
import torch
# load model and tokenizer
processor = Wav2Vec2Processor.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn")
input_values = processor(x, return_tensors="pt", padding="longest").input_values # Batch size 1
input_values