In my language, some words are truly random but influence later words dramatically

How can I denote that these words should not be used as labels, but should be included in features for training from scratch in torch gpt2.
Im currently using and getting incredible results but I believe the random words in every sentence are producing too much noise to break 0.1515 nll.