How to preprocess a wikipedia dataset using DataflowRunner?

Hi ! Indeed datasets/run_beam.py at main 路 huggingface/datasets 路 GitHub doesn鈥檛 seem to support passing builder kwargs like date or language when instantiating the DatasetBuilder (builder_cls in the code).

Feel free to modify this script to your needs, and if you want to open a PR to support passing builder kwargs that could also benefit other people :slight_smile:

1 Like