What could be the reason for duplicate tokens when using model.generate in converted model?

Hello there!

I ran into duplicate tokens issue when trying to use converted model (Marian Model, Helsinki-NLP / Tatoeba-Challenge) in HF environment.

I have converted the model to usable in HF using the convert_marian_tatoeba_to_pytorch.py script.

%%bash
cd /content/gdrive/MyDrive/convert_exp/transformers
python src/transformers/models/marian/convert_marian_to_pytorch.py --src PATH_TO_MARIAN_MODEL --dest PATH_TO_PT_MODEL

>> added 1 tokens to vocab

When working with the conversion result, the tokenizer works fine, but model.generate() generates duplicate tokens.

tokenizer = AutoTokenizer.from_pretrained("PATH_TO_PT_MODEL")
model = AutoModelForSeq2SeqLM.from_pretrained("PATH_TO_PT_MODEL")

input_line, device = 'سلام ، آیا می توانید این را ترجمه کنید؟', 'cuda'
model.to(device)
inputs = tokenizer.encode(input_line, return_tensors="pt").to(device)
out = model.generate(inputs, max_length=100) # limiting max_length
out

>> tensor([[63282, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927,
     12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927,
     12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927,
     12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927,
     12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927,
     12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927,
     12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927,
     12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927,
     12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927,
     12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927, 12927,     0]],
   device='cuda:0')

I assumed that there was an error in the position of the special tokens in the dictionary. In this regard, I performed shift_tokens_right, by analogy with how it is described in this topic.

from transformers.models.marian import modeling_marian
inputs = modeling_marian.shift_tokens_right(input_ids=inputs, pad_token_id=tokenizer.pad_token_id, decoder_start_token_id=tokenizer.pad_token_id)
out = model.generate(inputs, max_length=100) #limiting max_length
out

>> tensor([[63282, 40393, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876,
         15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876,
         15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876,
         15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876,
         15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876,
         15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876,
         15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876,
         15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876,
         15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876,
         15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876, 15876,     0]])

Making pad_token_id equal to -100 didn’t give proper result too.

inputs[inputs == tokenizer.pad_token_id] = -100
out = model.generate(inputs, max_length=100) #limiting max_length
out

>> ---------------------------------------------------------------------------
>> IndexError                                Traceback (most recent call last)
>> <ipython-input-40-d891cf4b1584> in <module>()
>> ----> 1 out = model.generate(inputs, max_length=100) #limiting max_length
>>       2 out

>> 7 frames
>> /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
>>    1911         # remove once script supports set_grad_enabled
>>    1912         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
>> -> 1913     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
>>    1914 
>>    1915 

>> IndexError: index out of range in self

Did I make a mistake while converting or did I miss a step to make the model work in HF environment?
A draft of the work can be viewed in this Google Colab.