Autotrain NER models

Hi, I have a fairly large NER dataset (about 12 million rows). I am looking for an economical way to train the model. It appears that huggingface’s autotrain feature could help. But, it appears that the training set needs be uploaded in csv format rather than a dataset that has already been pre-processed (i.e, tokenized, aligned, etc…).

In general, is there a way to use huggingface’s autotrain for NER models? Can I use my already pre-proccessed dataset? If not, what is the correct format to provide using something like csv?


AutoTrain does support processed datasets if they are in hub. However, this is a fairly large dataset which might require custom deployment. I suggest mailing us at and we can discuss the details.