Can LayoutLM be used for images?


I am very new to transformers and found out about it when looking for a LayoutLM implementation.

Now from my understanding, LayoutLM can be used to extract information from a document based on the layout it guessed.

When browsing the documentation, I could only see examples using plain text and I don’t know where to begin to put an image instead.

If it would be possible to help a newbie like me, showing how to pass it an image and how to interpret the results, you would really make me an happy man!!

I really hope someone can help me.

Have a great day :slight_smile:

Hi eveningkid,

transformer models are designed for text.

It might be possible to force the model to accept a numeric representation of an image (after all, it’s all ones and noughts), but it would be unlikely to do anything useful.

Especially Image embeddings are not implemented and open-sourced. You can see this thread but it should be harder according to the thread