Hi,
I am trying to add custom tokens using this code below:
# Let's see how to increase the vocabulary of Bert model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForMaskedLM.from_pretrained('bert-base-uncased')
num_added_toks = tokenizer.add_tokens(['token_1'])
print('We have added', num_added_toks, 'tokens')
model.resize_token_embeddings(len(tokenizer)) # Notice: resize_token_embeddings expect to receive the full size of the new vocabulary, i.e. the length of the tokenizer.
Though, when executing the above code, I get this error:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-36-31798d520617> in <module>()
1 # Let's see how to increase the vocabulary of Bert model and tokenizer
----> 2 tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
3 model = AutoModelForMaskedLM.from_pretrained('bert-base-uncased')
4
5 num_added_toks = tokenizer.add_tokens(['token_1'])
NameError: name 'BertTokenizer' is not defined