Hi there,
I’m trying to further train from the scibert_scivocab_uncased model, using the run_mlm script. I’ve had no issues further training from BERT_base and RoBERTa but I’m a bit stuck with sciBERT.
SciBERT is not one of the basic models you can directly call from run_mlm.py
So I downloaded the model from allenai/scibert_scivocab_uncased at main
to run:
"python myrun_mlm.py "
"--model_name_or_path=scibert_scivocab_uncased "
But the tokenizer files (tokenizer.json, tokenizer_config.json,…) are missing so it’s not working. I can’t find the tokenizers files in the allenAI scibert git repo either.
What am I missing there?
Thanks for the help!