Hello there
I was trying to run the code in the colab provided in the tutorial How to train a new language model from scratch using Transformers and Tokenizers
I run into a problem: when reaching the snippet
tokenizer = RobertaTokenizerFast.from_pretrained("./EsperBERTo", max_len=512)
I constantly got the error
file ./EsperBERTo/config.json not found
So I did some research and find out that this line of code is possibly outdated:
tokenizer.save_model("EsperBERTo")
I then changed the code into
tokenizer.save_pretrained("EsperBERTo")
since, as in the documentation, the save_pretrained
method allows to specify the save directory where
Directory where the configuration JSON file will be saved (will be created if it does not exist).
This made the previous error disappear
I guess that maybe the tutorial requires an update.