Hugdatafast: hugginface/nlp + fastai

Hugdatafast: huggingface/nlp :heavy_plus_sign: fastai

An integration to make use of hundreds of datasets with fastai, and some handy transforms to make concatenated dataset like language model dataset.

:inbox_tray: pip install hugdatafast
:open_book: Documentation:

Doing NLP ?
See if you can turn your data pipeline into just 3 lines. :sunglasses:

The updates will also be tweeted on my Twitter Richard Wang.


Update: add a example for preparing any hugginface/nlp dataset for (traditional) langugage model, or implement custom context window.

Update the update: I cancel the updates.
โ€” Reason โ€” (just for notes, skipping is ok)
Originally I want to introduce LMTransform and CombineTransform, which can do context window over examples. But I suddenly thought there is few cases we need context window across examples. Examples in a dataset are often not consecutive, we donโ€™t need to concatenate texts not related. So these classes might be only useful for my personal use case.

1 Like