How can I use iterator to run ((transformers\examples\pytorch\summarization\run_summarization.py)) with huge data files (in JSON Lines format, size> 20 GB) to avoid loading the whole file into the memory? Any suggestions how to use the code with large size JSON files?
Framework:
Transformers 4.20.1
Pytorch 1.11.0+cu113
Datasets 2.3.2
Tokenizers 0.12.1