Need to read subset of data files in WMT14

I need to load subset of dataset files in WMT14.
and TEST and VALIDATION sets.
How can I do it? I have tried using split function and data_files. I was unable to do it. Kindly help.

Hi ! I think you can just create your own wmt_14_gigafren_only dataset script.

To do so you can download the python files from wmt_14 here: datasets/datasets/wmt14 at master · huggingface/datasets · GitHub. Then rename to and edit this file to only keep gigafren in the _subsets part of the code.

Finally you just need to do load_dataset("path/to/") and you’re done :slight_smile: