Nlp 0.3.0 is out!

thomwolf · July 8, 2020, 12:32pm

Yes definitely, it’s already possible to load your own CSV or JSON files like this:

from nlp import load_dataset

dataset = load_dataset('csv', data_files='my_file.csv')
dataset = load_dataset('csv', data_files=['my_file_1.csv', 'my_file_2.csv', 'my_file_3.csv'])
dataset = load_dataset('csv', data_files={'train': ['my_train_file_1.csv', 'my_train_file_2.csv'], 
                                          'test': 'my_test_file.csv'})

(replace with json, and soon with pandas as well, for loading from JSON and pandas files)
and we plan to add more options to load datasets from your own data, both from external files and from data which is already loaded in memory in your python session.

For data which is already in memory like a python dict or a pandas dataframe you can have a look at the PR on this here: https://github.com/huggingface/nlp/pull/350 which should be merged soon.

Overall we want to add more doc and examples of use-cases very soon.

Other exciting topics coming soon for the library are:

simple and efficient ways to index, encode and query datasets records
tracability and reproductibility features.

Topic		Replies	Views
Sentence Order Prediction - Dataset Creation 🤗Datasets	1	678	October 21, 2021
Pipeline with custom dataset tokenizer: when to save/load manually 🤗Datasets	18	5613	September 18, 2020
NLP for Summarization and classification Beginners	4	63	January 22, 2025
Unable to use custom dataset when training a tokenizer Beginners	2	362	August 11, 2021
Help understanding how to build a dataset for language as with the old TextDataset 🤗Datasets	7	12708	October 6, 2021

Nlp 0.3.0 is out!

Related topics