Labels in Audio Frame classification task (Wav2Vec2 For Audio Frame Classification)

maureendss · August 7, 2023, 1:10pm

I’m working on fine-tuning a speech model (either Wav2Vec or HuBERT) to classify speech at the frame level (e.g., every 20ms of audio must be classified). Specifically, I’m looking to use the Wav2Vec2ForAudioFrameClassification method, but I’m uncertain about the shape and format of the labels required.

For this task, I’d like to input a torch tensor with a length corresponding to the audio length divided by the frame length (e.g., 20ms), containing binary values of 0s or 1s. Is this approach feasible? I’m having difficulty understanding the precise requirements for the labels.

Does anyone have an example of using Wav2Vec2ForAudioFrameClassification, or can someone guide me on the correct shape and formatting for the labels? Any help would be appreciated, thanks a lot!

bnestor · January 7, 2025, 3:12am

The final dimension of your labels when feeding them into the model should be shape [batch_size, sequence_length*num_classes]. Internally, the model is calling torch.argmax(labels.view(-1, self.num_classes), axis=1).long()

The snippet below will work for a binary classification problem.

check_labels = labels.long() # copy before modification.
labels = torch.nn.functional.one_hot(labels.long(), num_classes=2)

labels = labels.view(labels.shape[0], labels.shape[1]*labels.shape[2])

# this is the line used in the model.
num_classes=2
for i in range(len(check_labels)):
    check_against=labels[i]
    check_against =  torch.argmax(check_against.view(-1, num_classes), axis=1).long()
    torch.testing.assert_close(check_labels[i], check_against, rtol=1e-5, atol=1e-5)

Topic		Replies	Views
Can someone give me a simple example on how to train Wav2Vec2 for audio frame classification? Models	1	290	January 7, 2025
Using Wav2Vec in speech classification/regression problems Languages at Hugging Face	13	9581	November 16, 2022
A hypothetical question on multi-headed wav2vec2 / hubert models 🤗Transformers	0	345	December 15, 2021
Length of windows on which Wav2Vec2 operates Beginners	0	158	March 16, 2024
Batch input for wav2vec2 pretraining Beginners	1	368	July 15, 2021

Labels in Audio Frame classification task (Wav2Vec2 For Audio Frame Classification)

Related topics