Hi there!
I’m new to this forum so I hope I’m posting this in the right place…
I am new to using gpt2/HuggingFace library but am trying to figure out how to use it for my purposes. I am currently trying to compare the probability of prediction tokens from GPT2 to actual tokens in an excerpt (Using a random book for now). My problem is, sometimes this token doesn’t exist in the vocab list, so a probability is not generated. What could I do to overcome this? An example would be ‘clocks’ - which I’m thinking maybe I’ll just have to go with the lemmatized word, but also ‘striking’ which cannot be further lemmatized, but it’s not in the vocab?
Many thanks!
Rain