when a user my vocab to pretrain, my code is
tokenizer = BertTokenizer.from_pretrained('my_vocab.txt' )
the out is:
FutureWarning: This dataset will be removed from the library soon, preprocessing should be handled with the Datasets library. You can have a look at this example script for pointers: transformers/run_mlm.py at master · huggingface/transformers · GitHub
how can i do is corect in future? I don`t understand this.