Tuto on how to train a translation from scratch in a pythonic way?

ahGadji · June 22, 2022, 5:04pm

Hi HugginFace Community !!

I’m working on a translation model that involves a language pair that is not available in the hub model yet.
So training it from scratch is the only way to go. But I’ve not seen any tutorial related to the training of huggingface models from scratch. The MarianMT class is the one that I want to use but I’m still discovering the huggingface library and I don’t know how to go with it. For example, the MarianTokenizer (the tokenizer of the MarianMT model) requires a SentencePiece model stored in a .spm file but what I’ve seen through the internet that SentencePiece models are only stored in .model files and I’m unfamiliar with the .spm format. So I’m stuck. Is there a way to train from scratch a translation model?

victordiao · November 29, 2022, 2:21pm

Same question! I want to train a translation model from scratch to reproduce the results in

RaphaelKalandadze · October 23, 2023, 10:31am

Hi, thanks for this question
Did you manage to solve this problem?

Topic		Replies	Views
How to train Marian Machine Translation Models	1	1031	June 23, 2022
How to train a translation model from scratch Beginners	9	12578	March 1, 2022
Issues with save_pretrained (MarianMT) Beginners	1	656	April 11, 2023
Is it possible to train a translation model from scratch that translates from English to infrastructure files? Beginners	0	306	February 19, 2023
Enhance a MarianMT pretrained model from HuggingFace with more training data Beginners	4	2709	May 29, 2021

Tuto on how to train a translation from scratch in a pythonic way?

Related topics