Wav2Vec2Model: Expected 3-dimensional input for 3-dimensional weight [512, 10, 10], but got 4-dimensional input of size [16, 1, 10, 1000] instead

I’m trying to use Wav2Vec2Model with a multi-channel input. For this, I have edited the 1st layer in the feature_extractor of Wav2Vec2Model.

Code:

from transformers import Wav2Vec2Model, Wav2Vec2Config

configuration = Wav2Vec2Config()
model = Wav2Vec2Model(configuration)

model.feature_extractor.conv_layers[0].conv = nn.Conv1d(10, 512, kernel_size=(10,), stride=(5,), bias=False)

input = torch.rand(16,10,1000)
out = model(input)

Input Shape: (16, 10, 1000) where 16: batch size ; 10: num of channels ; 100: length

Error:

RuntimeError: Expected 3-dimensional input for 3-dimensional weight [512, 10, 10], but got 4-dimensional input of size [16, 1, 10, 1000] instead

Any solutions?

1 Like

this problem seems to be related to