T5 generation compatibility with original code

I managed to load original T5 checkpoints (after some finetuning in mesh tensorflow) into Pytorch with the provieded scripts. However I noticed a strange issue: my generated texts are exactly the same just always a little shorter than with the original mesh tensorflow code. Further I noticed that the generate method starts with 0 as the first of decoder_input_ids however in mesh tensorflow it does something different I think. Initializes a so called Context or something. Any ideas how to make them compatible (given that I’m right about it)?