Getting started with community BART model with no documentation


I’d like to run some fine-tuning experiments with this German BART model, but am finding it difficult to even get started due to the lack of documentation.

From what I can tell, the model is configured as FSMTForConditionalGeneration, which requires language tags to be specified when loading the tokenizer. My naïve guess would be to specify something like ['de', 'de'] (for German) or ['src', 'tgt'], however, doing either of these simply returns any input text as a sequence of tokens. Below is a minimal example.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("timo/timo-BART-german", ['de', 'de'])
text = "Meine Freunde sind nett aber sie essen zu viel Kuchen."
input_ids = tokenizer([text], add_special_tokens=False, return_tensors='pt')['input_ids']

>>> tensor([[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
                    3, 3, 3, 3, 3, 3, 3, 3, 3, 3]])

Is anyone able to point me in the direction of a good tutorial/guide on how to get started with community models? Or better yet, @timo, any chance of providing a model card for this model to give an idea of its status/usability?

Thanks in advance!