Is there any dataset for fake news (different from sentiment analysis) detection? I have one NELA-GT but then I would need to pre-train that from scratch?
Any methods, am I on the correct page ? https://huggingface.co/transformers/training.html
I want to use BERT model,
You could try to get baseline with fine-tuning before going for pre-training and then make decision based on the results.
This thread has nice pointer for pre-training
thanks, so it seems I need to know the difference between pre-training and fine-tuning?
This resources should help
thanks , yours information is always very useful
if I am correct, the pre-training on any corpus is unsupervised, by that I mean the text is large amount without any labels, however in fine-tuning, we should have labels?
Yes, in modern nlp, the models are pre-trained using unspervised objective (maksed lanaguge modeling, auto-regressive LM, document denosing etc).
And the downstream tasks (classification, QA) etc are supervised. Again, the above resources should help you understand the difference better.
@valhalla thanks, you have so much knowledge, thanks for sharing with newbies.