FutureWarning about BertTokenizer.from_pretrained() at latest version

ccfeidao · June 6, 2022, 7:22am

I use BertTokenizer.from_pretrained(‘file path’); The file is my manually write the vaocb.txt. , It’s have FutureWarning.How can I continue from my handwritten vocab.txt to load my own tokenizer?

FutureWarning: Calling BertTokenizer.from_pretrained() with the path to a single file or url is deprecated and won't be possible anymore in v5. Use a model identifier or the path to a directory instead.

vacob.txt is this:

[PAD]
[UNK]
[CLS]
[SEP]
[MASK]
...
my_word1
my_word2
my_word3

My goal is to train a new “language”, in which Tokenizer needs to be manually specified, rather than directly loading from the pretrained model.

Topic		Replies	Views
A problem about FutureWarning？ 🤗Tokenizers	0	1245	August 18, 2021
Load pretrained model's tokenizer with or without vocabulary? Beginners	2	148	August 30, 2024
Can't load tokenizer using from_pretrained Beginners	0	236	July 11, 2023
Using a BertWordPieceTokenizer trained from scratch from transformers 🤗Tokenizers	2	4993	March 26, 2021
Can't load pre-trained tokenizer with additional new tokens 🤗Transformers	3	4426	August 10, 2021

FutureWarning about BertTokenizer.from_pretrained() at latest version

Related topics