Dataset for fake news detection, fine tune or pre-train

Is there any dataset for fake news (different from sentiment analysis) detection? I have one NELA-GT but then I would need to pre-train that from scratch?
Any methods, am I on the correct page ? https://huggingface.co/transformers/training.html

I want to use BERT model,

thanks

You could try to get baseline with fine-tuning before going for pre-training and then make decision based on the results.

This thread has nice pointer for pre-training

1 Like

thanks, so it seems I need to know the difference between pre-training and fine-tuning?

This resources should help


thanks , yours information is always very useful

if I am correct, the pre-training on any corpus is unsupervised, by that I mean the text is large amount without any labels, however in fine-tuning, we should have labels?

Yes, in modern nlp, the models are pre-trained using unspervised objective (maksed lanaguge modeling, auto-regressive LM, document denosing etc).

And the downstream tasks (classification, QA) etc are supervised. Again, the above resources should help you understand the difference better.

1 Like

@valhalla thanks, you have so much knowledge, thanks for sharing with newbies.