Hugdatafast: huggingface/nlp fastai
An integration to make use of hundreds of datasets with fastai, and some handy transforms to make concatenated dataset like language model dataset.
pip install hugdatafast
Doing NLP ?
See if you can turn your data pipeline into just 3 lines.
The updates will also be tweeted on my Twitter Richard Wang.
Update: add a example for preparing any hugginface/nlp dataset for (traditional) langugage model, or implement custom context window.
Update the update: I cancel the updates.
— Reason — (just for notes, skipping is ok)
Originally I want to introduce LMTransform and CombineTransform, which can do context window over examples. But I suddenly thought there is few cases we need context window across examples. Examples in a dataset are often not consecutive, we don’t need to concatenate texts not related. So these classes might be only useful for my personal use case.