Hi. I am working on encoding a huge dataset using LayoutLMv3. However, my dataset has around 20,000 files, and after I encoded, I want to compress and store to use later. I tried to use Python Pickle, but it only supports up to 2GB. My file is around 4-5GB. I also used .h5 to compress, but when I re-open the .h5 file, it does not allow me to use to train the model.
Are there any solution can I use to compress? Thanks