Hi,
Can I download the raw text as a file of a dataset?
Thanks,
Steve
Hi,
Can I download the raw text as a file of a dataset?
Thanks,
Steve
There are a couple of options. Using the imdb dataset as an example.
As an arrow file:
from datasets import load_dataset
im = load_dataset('imdb')
imdb.save_to_disk(dataset_dict_path='./imdb')
Will save your files in the imdb directory.
Or convert to pandas as then save as csv / json
from datasets import load_dataset
im = load_dataset('imdb')
im.set_format('pandas')
df = im['train'][:]
df.to_csv('imdb_train.csv')
Hi! You can download text files from remote (or local) endpoints as follows:
from datasets import load_dataset
dset = load_dataset("text", data_files=<url>)