How to update vocabulary of whisper processor

I want to fine-tune my whisper model on dataset containing transcriptions that are not present in the vocabulary of the processor. How can I update the vocabulary of processor based on the transcripts present in the training dataset ?

I have tried using the code below:

    all_tokens = []
    for sent in dataset:
        all_tokens.extend(tok for tok in dataset.split())

    old_vocab = processor.tokenizer.get_vocab()
    new_tokens = list(set(all_tokens) - set(old_vocab.keys()))

    processor.tokenizer.add_tokens(new_tokens)

But doing this generates no transcriptions when model is used for transcriptions.

@sanchit-gandhi can you help me with this?