I have a dataset of 4M images with metadata that I already formatted as HF dataset from the pandas DF.
I would like to add a column that has the PIL image as a lot of these dataset have.
With the size of the dataset I need to process I am not sure what is the best approach to do it as running a lip or map will def go OOM
Any ideas?
also how to create the json file so that if I save to disk then I can load it using load_dataset
and not load_from_disk
?