Trying to get predicted text in fine tuned whisper-base.en (Qunatized onnx)

I am trying to get the resulted text from this model but I only get two results
last_hidden_state with shape (1, 2, 512) and onnx::MatMul_949 with shape of (1, 1500, 512)
how can I get token ids from this two outputs
this onnx was made by transformers.onnx as one output

I tried using optimum too and it gave me several models :

here is the code i used this models in :

I used the encoder and decoder and I was able to get logits that gave
valid token ids but only 2 tokens, the first I am sure it’s correct