Is there a complete Speech2Text example?

sfalk · November 24, 2021, 9:36am

Hi!

I am currently trying to train a Speech2TextModel from scratch but I can’t seem to find a complete example on how to do this.

I’ve started to go through it by myself but this turns out to be a trial & error kind of thing. For example, I don’t know how to create a Speech2TextTokenizer. I got my spm_file but how exactly should the vocab_file look like? Do I generate this from the SentencePieceProcesor? Why can’t I set the vocab file created by sentencpiece and so on…

Is there a comprehensive guide I overlooked?

Update:

I was able to proceed a bit further but only via debugging the code and some trial & error.

I am able to start the training now but I am not sure if I am using all the right pieces here. I don’t know if I need to use the Trainer or the Seq2SeqTrainer. My biggest problem is the data_collator as I have no clue what it’s supposed to return.

Right now I am returning the following:

@dataclass
class Speech2TextCollator:

    def __init__(self, processor: Speech2TextProcessor):
        self.processor = processor

    def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
        inputs = [torch.Tensor(f["inputs"]) for f in features]
        targets = [torch.Tensor(f["targets"]) for f in features]
        # Create batches
        inputs_batch = pad_sequence(inputs, batch_first=True)
        targets_batch = pad_sequence(targets, batch_first=True).long()
        attention_mask = pad_sequence([f["attention_mask"] for f in features], batch_first=True).long()
        return dict(
            input_features=inputs_batch,
            decoder_input_ids=targets_batch,
            attention_mask=attention_mask,
            labels=targets_batch
        )

Depending on whether I set label_smoothing_factor=1 for the TrainingArguments I get either a KeyError: 'logits' or KeyError: 'loss'.

Can somebody help me out here?

Topic		Replies	Views
Need help training Speech2Text from scratch 🤗Transformers	0	880	November 26, 2021
RuntimeError: grad can be implicitly created only for scalar outputs 🤗Transformers	0	1053	August 10, 2023
Train tokenizer for seq2seq model 🤗Tokenizers	0	340	April 19, 2024
Training sentencePiece from scratch? 🤗Tokenizers	8	19240	December 19, 2023
Keyword generation using T5 Models	4	1989	November 2, 2022

Is there a complete Speech2Text example?

Related topics