I followed the code that exists in this post by BramVanroy Generate raw word embeddings using transformer models like BERT for downstream process - #2 by BramVanroy to get the embeddings for each word but if have a file that contains more than 20.000 sentences and gets a lot of tokens that throw like this error
ValueError: 'p' is not in list
from the list which return
sent.split() ['smoke', 'coming', 'out', 'of', 'stack', 'with', 'green', 'traffic', 'light', 'and', 'picture', 'of', 'mr', 'peanut', 'on', 'building']
p
here error from word peanut
I am already can add new-token but manually how can I make if the condition on the word if it does not exist then add new_token of the word from the list