Hi @Rkoy,
Since #241, we have enabled the possibility to only export one decoder : the latter will not have pre-computed key/values as inputs. This will results in the past_key_values to be computed at each generation step. To enable this export you only need to set use_cache to False when calling the from_pretrained method. To speed up decoding by leveraging the key/values hidden-states which have already been computed in the previous generation step, you need to export a second decoder with additional pre-computed key/values as inputs.