In my language, some words are truly random but influence later words dramatically

wesboyt · May 28, 2023, 8:55pm

How can I denote that these words should not be used as labels, but should be included in features for training from scratch in torch gpt2.
Im currently using run_clm.py and getting incredible results but I believe the random words in every sentence are producing too much noise to break 0.1515 nll.

Topic		Replies	Views
How to label dataset for Causal Language Modeling Beginners	0	522	January 27, 2023
How to pretrain randomized language model with custom dataset Beginners	0	63	May 15, 2024
Print All Tokens Over a Certain Probability Threshold Research	3	1116	July 21, 2020
GPT-2 Perplexity Score Normalized on Sentence Lenght? Beginners	2	1828	October 15, 2021
Repetitive words in model output Models	1	49	December 18, 2024

In my language, some words are truly random but influence later words dramatically

Related topics