I’m working on fine tuning a BERT model try to solve a NER problem. I have been given some labels dataset. The result is not as good as I expected. So I wonder if I replace some words by random string, then fine tune BERT model, will this strategy make the model learn more from the context rather than the word embedding? Since the random string will not be in the pretained model vocabulary.