How could I fusion the logits from different models then convert it to Token?
Suppose I have tow logits x,y from GPT2-base and GPT2-large, with the same shape: 85032000.
8 is the batch size, and 50 is the length of input tokens, 32000 is the vocabe size.
Then I fusion the x and y using: Z= x+y
So, how could I convert the Z to the tokens with the generation function?
I have tried to use the [argmax(z) for z in Z], but the results are bad. And it seems due to I didn’t consider the stopping criteria, but how?