Tiny mBART doc/info

I could not find any documentation/info for the sshleifer/tiny-mbart model. How big it is? How it was trained? What is the peformance, etc.? Did I miss something?

AFAIK, i think this model is created for testing purpose. Pinging @sshleifer for confirmation.

…any chance for a production ready smaller mBART? Like mbart-base?

not that I know of.

How small do you want @martin-avrios ?
For english-romanian or cc25?

I will distill anything en-ro if you can figure out TPU :slight_smile:

tiny-mBART is just for testing purposes, it’s randomly initialized.

For cc25. I am working with mostly german and some french and italian docs and if I don’t freeze embeds than mbart-large-cc25 does not fit into the 16GB V100 (which is the largest Google Cloud has for a single GPU). Not even with batch size 1. So I thought I try a smaller mBART. But yes, TPU would be even more awesome because I could experiment with a lot more then.

Yeah I have been running everything with --freeze_embeds.

I’m also trying to figure out how to trim the embeddings, as most of them aren’t used, but blocked on

I’m also happy to finetune mbart-cc25 on a public dataset for you, if that would help.
Also, have you tried using Marian?

Have not tried Marian yet but it seems interesting. It’s for translation, right? But since I used translation mode for my problem it could definitely work.

Also I found excellent pre-trained models on TF Hub but they are not fine-tunable (according to the page). TransformerXL pre-trained on Wiki40B (a new dataset in 40 languages), separate model for each language. At least for me this would be the ultimate model. Seq2seq, unlimited sequence length and 41 languages. See https://tfhub.dev/google/collections/wiki40b-lm/1

1 Like

yes its for translation.
Try it out! It’s much smaller/faster than bart and we have 1100 language pairs:

I’ve never actually finetuned it so let me know if there are any bugs!

I noticed there is a de-de pair which is exactly what I need but I wonder who else needs this and what for? Looks like somebody already tried summarization in german?

link to model?


I believe that is an accident, not a summarization model.

According to Jörg Tiedemann, the author, it’s a paraphraser, rather than a summarizer. He writes,

There are texts with alternative translations into the same language, which I used for training intralingual models like this one. They are maybe not very useful at this moment as they probably just copy the input text. That this only obtains 40 BLEU is not that strange as this is tested with paraphrased sentences. Note also that this model is really rather a paraphrase model than a summarisation model as they seem to use it in the discussion