Is there a way to check if a model does know a specific word?
I’ve got a list of names and for evaluation I would like to check if the model does know any of these names.
I’m currently training bert-base-german-cased
which I want to check before and after. After the training the names should be known, as they are included in the training set.
Edit:
I tried the following code:
def check(name):
name_splitted = name.split()
for s in name_splitted:
if not s in tokenizer.vocab.keys():
print(f'{name} not found')
return False
return True
passed = []
for name in names:
result = check(name)
if result:
passed.append(name)
I don’t think this is a good solution though. Names like “Microsoft” or “Google” are being found while “Walmart” and “FedEx” are not.