I have downloaded the s2orc dataset and saved it to disk.
Since it’s in the arrow format, I cannot figure out how to use the run_language_modeling command, since that seems to require a text file.
It seems like it would be simple. Can anyone help?
I have downloaded the s2orc dataset and saved it to disk.
Since it’s in the arrow format, I cannot figure out how to use the run_language_modeling command, since that seems to require a text file.
It seems like it would be simple. Can anyone help?
I modified run_language_modeling.py a little to make it work.
I would still be interested to hear if this is supported.