Build an end2end nlp toolkit with transformers and dataset

I want to use transformer with variety of tasks. However, It’s not easy to modify and import my own data set.

That’s why i build a set of toolkit based on transfromers and dataset, to make models more accessible and easy to use for beginners. :grinning:


Example

pip install nlprep tfkit nlp2go nlp2
# download dataset
nlprep --dataset clas_snli --outdir dataset_clas_snli 
# train
tfkit-train --maxlen 512 --epoch 5 --savedir ./snli_model/ --train ./dataset_clas_snli/snli-train.csv --test ./dataset_clas_snli/snli-test.csv --model clas --config distilroberta-base 
# eval
tfkit-eval --model ./snli_model/1.pt --valid ./dataset_clas_snli/snli-validation.csv --metric clas
# cli for testing
nlp2go --model  ./snli_model/1.pt --cli 

Heres is all of my projects:
GitHub - voidful/NLPrep: 🍳 NLPrep - dataset tool for many natural language processing task - Turn dataset into a “ready to go” format for models.
GitHub - voidful/TFkit: 🤖📇 handling multiple nlp task in one pipeline - multi-task multi-model training and evaluation toolkit

Hope that it can bring inspiration to others.
Is there anything that can be improved in this process?
In what ways that my work can help the community?

3 Likes