Dataset for fake news detection, fine tune or pre-train

shainaraza · October 6, 2020, 1:09pm

Is there any dataset for fake news (different from sentiment analysis) detection? I have one NELA-GT but then I would need to pre-train that from scratch?
Any methods, am I on the correct page ? https://huggingface.co/transformers/training.html

I want to use BERT model,

thanks

valhalla · October 7, 2020, 4:18pm

You could try to get baseline with fine-tuning before going for pre-training and then make decision based on the results.

This thread has nice pointer for pre-training

shainaraza · October 7, 2020, 6:32pm

thanks, so it seems I need to know the difference between pre-training and fine-tuning?

valhalla · October 8, 2020, 10:46am

This resources should help

shainaraza · October 8, 2020, 1:23pm

thanks , yours information is always very useful

shainaraza · October 8, 2020, 2:22pm

if I am correct, the pre-training on any corpus is unsupervised, by that I mean the text is large amount without any labels, however in fine-tuning, we should have labels?

valhalla · October 12, 2020, 3:38pm

Yes, in modern nlp, the models are pre-trained using unspervised objective (maksed lanaguge modeling, auto-regressive LM, document denosing etc).

And the downstream tasks (classification, QA) etc are supervised. Again, the above resources should help you understand the difference better.

shainaraza · October 12, 2020, 3:41pm

@valhalla thanks, you have so much knowledge, thanks for sharing with newbies.

Topic		Replies	Views
Train from scratch vs further pretraining/fine tuning with MLM and NSP Research	1	1547	August 28, 2023
Pre-Train BERT (from scratch) Research	43	18993	June 27, 2022
How to do unsupervised fine-tuning? 🤗Transformers	1	6951	January 29, 2021
How to train BERT from scratch on a new domain for both MLM and NSP? Models	2	2295	February 6, 2021
Pretraining Models from Scratch vs Further Training 🤗Transformers	0	269	November 28, 2023

Dataset for fake news detection, fine tune or pre-train

Related topics