Extract visual and contextual features from images

Hi Niels,

thanks for your answer i will check this but have you any recommendation to rebuild this with transformers lib without the timm model :

? :slight_smile: