When following this blog,
I used English as the tokenizer since my unknown language has english alphabets.
But in theis field, if I remove config.language
#model.generation_config.language = "hindi"
model.generation_config.task = "transcribe"
model.generation_config.forced_decoder_ids = None
How will I handle the start_token_id here?
data_collator = DataCollatorSpeechSeq2SeqWithPadding(
processor=processor,
decoder_start_token_id=model.config.decoder_start_token_id,
)
Can I just remove this part?
#decoder_start_token_id=model.config.decoder_start_token_id,
``