Trying to get predicted text in fine tuned whisper-base.en (Qunatized onnx)

I am trying to get the resulted text from this model but I only get two results
last_hidden_state with shape (1, 2, 512) and onnx::MatMul_949 with shape of (1, 1500, 512)
how can I get token ids from this two outputs
this onnx was made by transformers.onnx as one output

I tried using optimum too and it gave me several models :
“decoder_model.onnx”,
“decoder_model_quantized.onnx”,
“encoder_model.onnx”,
“decoder_model_merged.onnx”,
“decoder_with_past_model.onnx”,
“encoder_model_quantized.onnx”,
“decoder_model_merged_quantized.onnx”,
“decoder_with_past_model_quantized.onnx”

here is the code i used this models in :

I used the encoder and decoder and I was able to get logits that gave
valid token ids but only 2 tokens, the first I am sure it’s correct