@philschmid the PR you have linked is merged, however as far as I can tell it does not contain support for imagefolder
. This is a pretty important functionality, since as is I have 2 options:
- Download the entire dataset to SageMaker EFS, preprocess it, and save it to S3. This takes a lot of time and is inconvenient code-wise.
- Process data every time in HuggingFace Estimator
train.py
script. This is very costly and time-consuming, since e.g. in hyperparameter optimization I would have to do this every time, in every estimator, and on GPU instance.
Would making a separate Github issue for this make sense in this case?
I basically want something like this, but without downloading everything from S3 manually.