Nlp 0.3.0 is out!

Yes definitely, it’s already possible to load your own CSV or JSON files like this:

from nlp import load_dataset

dataset = load_dataset('csv', data_files='my_file.csv')
dataset = load_dataset('csv', data_files=['my_file_1.csv', 'my_file_2.csv', 'my_file_3.csv'])
dataset = load_dataset('csv', data_files={'train': ['my_train_file_1.csv', 'my_train_file_2.csv'], 
                                          'test': 'my_test_file.csv'})

(replace with json, and soon with pandas as well, for loading from JSON and pandas files)
and we plan to add more options to load datasets from your own data, both from external files and from data which is already loaded in memory in your python session.

For data which is already in memory like a python dict or a pandas dataframe you can have a look at the PR on this here: https://github.com/huggingface/nlp/pull/350 which should be merged soon.

Overall we want to add more doc and examples of use-cases very soon.

Other exciting topics coming soon for the library are:

  • simple and efficient ways to index, encode and query datasets records
  • tracability and reproductibility features.
8 Likes