Hello, I have a theoretical question regarding training a language model on the same documents across several tasks.
In the context where one works with in-domain specialized data that is small in size (roughly 300MB of text files), I 'd like to know if there are evidences against using the data for both continual pre-training and downstream tasks, such as text classification.
In the above, any document of the corpus would be used to train on Masked Language Modeling task, and later would also be used to fine-tune our model on the text classification task.
Another more general example would be using T5, which is remarkably easy to use in a multitask fashion. By just changing the special token at the beginning of the input, I can hint that the task has changed.
Could I have:
in the same training data set? Is there evidence of such maneuver leading to negative learning effects?
P.S: This is my first topic on the forum, please do not hesitate to tell me if my question lacks clarity or is inappropriate in any way