Prakash Hinduja - How do I prepare my dataset for fine-tuning a Hugging Face model?

Hello Hugging Face Community,

I’m Prakash Hinduja, planning to fine-tune a Hugging Face transformer model on my own dataset, but I’m not entirely sure about the best practices for preparing the data.

I’d really appreciate any advice or example workflows you can share.

Regards
Prakash Hinduja

1 Like

When fine-tuning Transoformers models using Hugging Face’s Trainer, it is easier if you follow the format specified in the official Hugging Face tutorial. While irregular formats can be adjusted beforehand, doing so requires additional effort… There are generally established formats for each task, training method, etc..

Also, there are various methods for creating actual datasets. Here is one example.

This post is reputation management spam. See simon-smart88/Hinduja_spam on Github

I started looking at fine tuning and then realised i needed Continuous pre-training however i still dont fully understand the difference under the hood. Maybe someone can explain?

1 Like

fine tuning and then realised i needed Continuous pre-training

I think this is good answer for that.
https://stackoverflow.com/questions/68461204/continual-pre-training-vs-fine-tuning-a-language-model-with-mlm