Adding a special language token to MBART

Is there a straightforward way of adding a new language to MBART’s tokenizer? The implementation seems quite intricate so it does not seem straightforward to add a new language code.

        self.sp_model_size = len(self.sp_model)
        self.lang_code_to_id = {
            code: self.sp_model_size + i + self.fairseq_offset for i, code in enumerate(FAIRSEQ_LANGUAGE_CODES)
        self.id_to_lang_code = {v: k for k, v in self.lang_code_to_id.items()}
        self.fairseq_tokens_to_ids["<mask>"] = len(self.sp_model) + len(self.lang_code_to_id) + self.fairseq_offset

        self.fairseq_ids_to_tokens = {v: k for k, v in self.fairseq_tokens_to_ids.items()}
        self._additional_special_tokens = list(self.lang_code_to_id.keys())

        if additional_special_tokens is not None:
            # Only add those special tokens if they are not already there.
                [t for t in additional_special_tokens if t not in self._additional_special_tokens]

        self._src_lang = src_lang if src_lang is not None else "en_XX"
        self.cur_lang_code_id = self.lang_code_to_id[self._src_lang]

Even with subclassing it is not immediately clear to me if and how one would add a custom language code that will be correctly recognized when using tokenizer(target_text=...) or other target language related things. Any tips?