I’m trying to create a dataset for an object detection task. The training images are stored on s3 and I would like to eventually use sagemaker and a estimator to train the model.
I’m trying to build on the example from @philschmid in Huggingface Sagemaker - Vision Transformer but with my own dataset and the model from Fine-tuning DETR on a custom dataset for object detection by @nielsr .
If I understand correctly I need to create a dataset first and then save it in the session bucket on s3 but I am not entirely sure how to do that with a dataset which is too big to pull locally first in order to create it.
I have found the
load_dataset function with the ‘imagefolder’ option which seems to do what I want for local image files but doesn’t seem to support filepaths on s3. I have also found the
load_from_disk function which seems to do the loading for datasets from s3 but doesn’t have an imagefolder option.
What is the best way to prepare my data in this case?
Thanks for the help!