Decode whisper logits to transcript using forward instead of generate method

I’m trying to use whisper to generate transcription. I can get results when using model.generate() . But when I try to use

    out = model(inputs, decoder_input_ids = torch.tensor([[50258]]).to('cuda'))
    predicted_ids = torch.argmax(out.logits, dim=-1)
    transcription = processor.batch_decode(predicted_ids)
    transcription
    ['<|startoftranscript|>']

I’m only getting one token output. Is the issue with decoder_input_ids? I want to use foward because I want use encoder and decoder embeddings for other tasks. Is there any workaround here?

We can get hidden_states_embeddings from model.generate by enabling the two flags , output_hidden_states=True, return_dict_in_generate=True.

Is it possible to access logits by using model.generate?
However it is possible to get logits from model but as you mentioned, I couldn’t get more than one token.

You can get scores in the model.generate function using output_scores=True and then apply softmax to get individual tokens logits.