I’m trying to use whisper to generate transcription. I can get results when using model.generate()
. But when I try to use
out = model(inputs, decoder_input_ids = torch.tensor([[50258]]).to('cuda'))
predicted_ids = torch.argmax(out.logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
transcription
['<|startoftranscript|>']
I’m only getting one token output. Is the issue with decoder_input_ids
? I want to use foward because I want use encoder
and decoder
embeddings for other tasks. Is there any workaround here?