Hi, I have a fairly large NER dataset (about 12 million rows). I am looking for an economical way to train the model. It appears that huggingface’s autotrain feature could help. But, it appears that the training set needs be uploaded in csv format rather than a dataset that has already been pre-processed (i.e, tokenized, aligned, etc…).
In general, is there a way to use huggingface’s autotrain for NER models? Can I use my already pre-proccessed dataset? If not, what is the correct format to provide using something like csv?