How to use dataset with run_language_modeling?

I have downloaded the s2orc dataset and saved it to disk.

Since it’s in the arrow format, I cannot figure out how to use the run_language_modeling command, since that seems to require a text file.

It seems like it would be simple. Can anyone help?

I modified run_language_modeling.py a little to make it work.

I would still be interested to hear if this is supported.