In my language, some words are truly random but influence later words dramatically

How can I denote that these words should not be used as labels, but should be included in features for training from scratch in torch gpt2.
Im currently using run_clm.py and getting incredible results but I believe the random words in every sentence are producing too much noise to break 0.1515 nll.