Should have a `model_type` key in its config.json

SpamMe · September 20, 2021, 4:07pm

I have trained a tokenizer from scratch, using:

tokenizer.train(files=[pth], vocab_size=52_000, min_frequency=2, special_tokens=[
    "<s>",
    "<pad>",
    "</s>",
    "<unk>",
    "<mask>",
])

I save the tokenizer, I use it to train a BERT model from scratch, and later I want to test this model using:

unmasker = pipeline(‘fill-mask’, model=model, tokenizer=tokenizer)

But it complains that the tokenizer is unrecogized:

“[…] Should have a model_type key in its config.json”

How can I save the tokenizer so that there is a model_type indicated in config.json?

Topic		Replies	Views
ValueError: Unrecognized model in ./trained_model. Should have a `model_type` key in its config.json Beginners	3	7079	January 7, 2025
Missing `model_type` key in config.json of TinyBERT 🤗Transformers	4	6898	March 17, 2021
OK to add arbitrary entries to model's config? 🤗Transformers	0	240	November 4, 2022
"How to train a new language model from scratch using Transformers and Tokenizers" possibly requiring an update Site Feedback	4	2559	November 1, 2022
Saving tokenizer's configuration Beginners	1	2811	February 24, 2022