Image Features as Model Input

CMPE-PL · August 20, 2020, 6:05am

Hello All. I apologize for the basic question but for some reason I am having difficulty using image features as input to a huggingface model.

My data comes in the form of an image feature numpy array extracted by a 2D CNN but all the models seem to be built for text based input.

If anybody could point me in the right direction of an example or a code snippet I would greatly appreciate it!

rgwatwormhill · November 18, 2020, 12:43pm

Hi CMPE-PL,

the transformers models are designed for Natural Language Processing, ie text. What makes you think they would be good for image features?

I expect you could bypass the tokenizers and input numbers directly, but I’m not sure it would do anything useful. If you did want to do that, you would need to ensure that your numbers were in the right format. For example, a BERT model expects input that is a vector of 768 real numbers for each word-token, or rather a matrix of 768 x N real numbers, where N is number of word-tokens in a text.

What size is your image feature array?

The main trick of transformers is to use Attention mechanisms. It is certainly possible to use Attention in image recognition models, but without using transformers. See this article for an example https://arxiv.org/abs/2004.13621

rgwatwormhill · November 18, 2020, 12:51pm

If you haven’t seen them already, you might find these articles useful.

Topic		Replies	Views
How to use transformer attention model when the input is features Beginners	1	1254	October 12, 2020
Using trasnsformer to get image features 🤗Transformers	3	3373	March 20, 2024
Extract visual and contextual features from images Models	5	4444	August 27, 2021
Vision Transformer reconstruct image 🤗Transformers	2	1132	July 21, 2022
Resources for Sign Language Translation Beginners	0	1682	August 18, 2020

Image Features as Model Input

Related topics