I have downloaded the s2orc dataset and saved it to disk.
Since it’s in the arrow format, I cannot figure out how to use the run_language_modeling command, since that seems to require a text file.
It seems like it would be simple. Can anyone help?