Finetuning classification model with new labels


I am trying to fine-tune language identification model using SpeechBrain. I followed the notebook [tutorial][1] on fine-tuning ASR model. However, I have trouble adding new label into the model. I would like to add new language their, so I edited label_encoder.txt and add there new line ‘yk: Yakut’ => 107, but during the training process I got the next error:

return torch._C._nn.nll_loss_nd(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
IndexError: Target 107 is out of bound

My language brain class looks like this:

class LanguageBrain(speechbrain.core.Brain):
    def on_stage_start(self, stage, epoch):
        # enable grad for all modules we want to fine-tune
        if stage == speechbrain.Stage.TRAIN:
            for module in [self.modules.compute_features, self.modules.mean_var_norm, 
                           self.modules.embedding_model, self.modules.classifier]:
                for p in module.parameters():
                    p.requires_grad = True
    def compute_forward(self, batch, stage):
        """Computation pipeline based on a encoder + speaker classifier.
        Data augmentation and environmental corruption are applied to the
        input speech.
        batch =
        wavs, lens = batch.sig
        feats = self.modules.compute_features(wavs)
        feats = self.modules.mean_var_norm(feats, lens)

        # Embeddings + speaker classifier
        embeddings = self.modules.embedding_model(feats, lens)
        outputs = self.modules.classifier(embeddings)

        return outputs, lens
    def compute_objectives(self, predictions, batch, stage):
        """Computes the loss using speaker-id as label.
        predictions, lens = predictions
        uttid =
        langid = batch.lang_id_encoded

        if stage == speechbrain.Stage.TRAIN:
            langid =[langid], dim=0)
        loss = self.hparams.compute_cost(predictions, langid.unsqueeze(1), lens)

        return loss
    def on_stage_end(self, stage, stage_loss, epoch=None):
        """Gets called at the end of an epoch."""
        stage_stats = {"loss": stage_loss}
            meta={"loss": stage_stats["loss"]},

I guess I need to do something with the classification layer. I deleted it and changed to layer with the output features I need:

import torch.nn as nn

class Identity(nn.Module):
    def __init__(self):
        super(Identity, self).__init__()
    def forward(self, x):
        return x
language_id.mods.classifier.out.w = Identity()
language_id.mods.classifier.out.w = nn.Linear(512, 108)

However in that case I got another error:

RuntimeError: Error(s) in loading state_dict for Classifier:     size mismatch for out.w.weight: copying a param with shape torch.Size([107, 512]) from checkpoint, the shape in current model is torch.Size([108, 512]).     size mismatch for out.w.bias: copying a param with shape torch.Size([107]) from checkpoint, the shape in current model is torch.Size([108]).

How could I train the model to predict new label class?

1 Like

Hello everyone!
I have a similar question.
I am training a SpeechBrain model: SpeakerID (Speaker identification). I trained with my own audios and since I have few audios I’m using a pre-trained model with VoxCeleb (I’m applying Transfer Learning). I want to know, please, is it correct to use VoxCeleb for this Speaker Identification task? … I ask that because there is other Data such as AMI, VOICES, etc.

Thank you very much!

Hello everyone!

And how would I go about calculating some performance metric for this SpeakerID task?

I’m trying to change this metric →

Thank you very much!