Can I download the raw text of a dataset?

Hi,
Can I download the raw text as a file of a dataset?

Thanks,
Steve

There are a couple of options. Using the imdb dataset as an example.

As an arrow file:

from datasets import load_dataset
im = load_dataset('imdb')
imdb.save_to_disk(dataset_dict_path='./imdb')

Will save your files in the imdb directory.

Or convert to pandas as then save as csv / json

from datasets import load_dataset
im = load_dataset('imdb')
im.set_format('pandas')
df = im['train'][:]
df.to_csv('imdb_train.csv')

Hi! You can download text files from remote (or local) endpoints as follows:

from datasets import load_dataset
dset = load_dataset("text", data_files=<url>)