Question about Multilingual Tokenizers expected behaviours

Hi there,

I am trying to fine-tune a MBart model for text generation with a multilingual dataset and following the steps from the documentation (MBart and MBart-50) to tokenize different sequences from different languages, I am seeing an unexpected behaviour:

from transformers import MBartForConditionalGeneration, MBartTokenizer

tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro", src_lang="en_XX", tgt_lang="ro_RO")
example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
expected_translation_romanian = "Şeful ONU declară că nu există o soluţie militară în Siria"

inputs = tokenizer(example_english_phrase, return_tensors="pt")
with tokenizer.as_target_tokenizer():
    labels = tokenizer(expected_translation_romanian, return_tensors="pt")

From my understanding the source and target input ids should have different structures, as follows:

The source text format is X [eos, src_lang_code] where X is the source text. The target text format is [tgt_lang_code] X [eos] . bos is never used.

When I tried to reproduce this snippet, this is what I have:

print(inputs['input_ids'])
tensor([[  8274, 127873,  25916,      7,   8622,   2071,    438,  67485,     53,     187895,     23,  51712,      2, 250004]])

print(labels['input_ids'])
tensor([[ 47711,   7844, 127666,      8,  18347,  18147,   1362,    315,  42071,   36,  31563,   8454,  33796,    451,    346, 125577,      2, 250020]])

In both cases the token for the language are in the end of the sequence. Shouldn’t be in the end for the input and in the start for the target?

Thanks in advance!