HuBERT: RuntimeError: Expected 3-dimensional input for 3-dimensional weight but got 5-dimensional input

loretoparisi · July 22, 2021, 5:58am

I get an error

RuntimeError: Expected 3-dimensional input for 3-dimensional weight [512, 1, 10], but got 5-dimensional input of size [1, 1, 1, 240000, 2] instead

while feeding the Wav2Vec2Processor and HubertForCTC with a wav audio file:

processor = Wav2Vec2Processor.from_pretrained("facebook/hubert-xlarge-ls960-ft"
    , cache_dir=os.getenv("cache_dir", "../../models"))
model = HubertForCTC.from_pretrained("facebook/hubert-xlarge-ls960-ft"
    , cache_dir=os.getenv("cache_dir", "../../models"))
for idx, audio in enumerate(train_loader):
    input_values = processor(audio, sampling_rate=sampling_rate, return_tensors="pt").input_values  # Batch size 1
    logits = model(input_values).logits
    predicted_ids = torch.argmax(logits, dim=-1)
    transcription = processor.decode(predicted_ids[0])
    print(transcription)

where the audio input comes from this function

def read_audio(self, audio_path):
        try:
            import soundfile as sf
            y, _ = sf.read(audio_path)
            return y # [1, 960000]
        except Exception as err:
            try:
                import librosa
                y, _ = librosa.load(audio_path, sr=self.sr)
                return y
            except Exception as err:
                pass
                return None

The shape of the audio is like (with soundfile):

0 torch.Size([1, 960000])
1 torch.Size([1, 240000, 2])

and librosa:

0 torch.Size([1, 240000])
1 torch.Size([1, 960000])

Testing code is here. Original question on SF here.

Topic		Replies	Views
Wav2Vec2Model: Expected 3-dimensional input for 3-dimensional weight [512, 10, 10], but got 4-dimensional input of size [16, 1, 10, 1000] instead Models	1	1370	July 22, 2021
Hubert ASR Fine Tuning giving weird results Models	1	1331	January 14, 2022
HUBERT Implementation with increased vocabulary size 🤗Transformers	0	87	May 13, 2024
Vocab_size value for facebook/w2v-bert-2.0 Models	0	253	November 13, 2024
Cannot train Wav2Vec2 processor with Wav2Vec2 or HuBERT Beginners	3	383	July 17, 2024

HuBERT: RuntimeError: Expected 3-dimensional input for 3-dimensional weight but got 5-dimensional input

Related topics