Need to read subset of data files in WMT14

I need to load subset of dataset files in WMT14.
dataset.split.Train:[‘gigafren’]
and TEST and VALIDATION sets.
How can I do it? I have tried using split function and data_files. I was unable to do it. Kindly help.

Hi ! I think you can just create your own wmt_14_gigafren_only dataset script.

To do so you can download the python files from wmt_14 here: datasets/datasets/wmt14 at master · huggingface/datasets · GitHub. Then rename wmt_14.py to wmt_14_gigafren_only.py and edit this file to only keep gigafren in the _subsets part of the code.

Finally you just need to do load_dataset("path/to/wmt_14_gigafren_only.py") and you’re done :slight_smile: