I’ve previously successfully tested using AutoTrain for image classification problems but I am currently running into issues uploading data.
Authentication: most of the datasets I want to test have been uploaded to the hub but cannot be shared publicly. When I try and select these datasets, I get an error.
I didn’t see an obvious way to pass in an auth token. Is this possible?
Column selection: When adding a public dataset (in this case, using biglam/encyclopaedia_britannica_illustrated), the mapping options for image/labels differ from the underlying dataset. In this case, the original dataset exposes an ‘image’ and ‘label’ column (plus some other metadata columns). When loading in autotrain image seems to be expanded to image.src, image.height and image.width.
I’m unsure if these are internal attributes or mean to be publicly exposed? Choosing what I assume would be the correct image column image.src as the image column in the mapping results in an error when loading.
Under the training tab an error is triggered when format_source is run:
Error type: InvalidColMappingError
Details: Column mapping {'label': 'target', 'image.src': 'image'} is invalid for data with columns ['image', 'label', 'id', 'meta'].
Column 'image.src' not found in data.
I assume this is because the internal loader is looking for image.src in the dataset and not finding it.
Apologies if this has been addressed before; I dug around for other issues but didn’t see anything related.
Tagging @abhishek, who might be the best person to address this.
Training on gated or private datasets is not supported in AutoTrain yet.
Re: Column selection
There is indeed an issue with our integration with the datasets server, which AutoTrain uses to fetch the dataset’s first rows. We are currently working on fixing this. I’ll let you know when this is fixed.
Thanks for confirming – some of the datasets that I’m working on are small enough to upload via the AutoTrain interface (which allows you to keep them private), so I can work around this.
Thanks for this. The last time I used AutoTrain for image classification, it worked very well, so looking forward to this fix
Hi, I met the similar problem in Text Classification (binary). When I select col names, it seems that all choices are from the second row . Could you please fix it?
I was successfully able to load the beans dataset for image classification. I am currently trying to upload my own dataset using both the image folder upload and a dataset hosted on the hub; however, both of these seem to get stuck in the processing step for longer than I would expect for the size of these datasets.
Update for people who might run across this in the future: the issue, in this case, was that the size of the images in my dataset where reasonably large. Resizing the images using the ImageMagick morgify command to 500px i.e. fixed this, i.e something like: