I’ve previously successfully tested using AutoTrain for image classification problems but I am currently running into issues uploading data.
Authentication: most of the datasets I want to test have been uploaded to the hub but cannot be shared publicly. When I try and select these datasets, I get an error.
I didn’t see an obvious way to pass in an auth token. Is this possible?
Column selection: When adding a public dataset (in this case, using biglam/encyclopaedia_britannica_illustrated), the mapping options for image/labels differ from the underlying dataset. In this case, the original dataset exposes an ‘image’ and ‘label’ column (plus some other metadata columns). When loading in autotrain image seems to be expanded to image.src
, image.height
and image.width
.
I’m unsure if these are internal attributes or mean to be publicly exposed? Choosing what I assume would be the correct image column image.src
as the image column in the mapping results in an error when loading.
Under the training tab an error is triggered when format_source
is run:
Error type: InvalidColMappingError
Details: Column mapping {'label': 'target', 'image.src': 'image'} is invalid for data with columns ['image', 'label', 'id', 'meta'].
Column 'image.src' not found in data.
I assume this is because the internal loader is looking for image.src
in the dataset and not finding it.
Apologies if this has been addressed before; I dug around for other issues but didn’t see anything related.
Tagging @abhishek, who might be the best person to address this.