How to extract text using LayoutLM2

I’m trying to use LayoutLMv2 to extract information from some invoices pictures.
So far, and based on what it’s here, I’ve run the following:

`from transformers import LayoutLMv2Processor, LayoutLMv2ForQuestionAnswering

from PIL import Image

import torch

processor = LayoutLMv2Processor.from_pretrained(“microsoft/layoutlmv2-base-uncased”)

model = LayoutLMv2ForQuestionAnswering.from_pretrained(“microsoft/layoutlmv2-base-uncased”)

image = Image.open(“name_of_your_document - can be a png file, pdf, etc.”).convert(“RGB”)

question = “what’s his name?”

encoding = processor(image, question, return_tensors=“pt”)

start_positions = torch.tensor([1])

end_positions = torch.tensor([3])

outputs = model(**encoding, start_positions=start_positions,
end_positions=end_positions)

loss = outputs.loss

start_scores = outputs.start_logits

end_scores = outputs.end_logits`

However, I’m not exactly sure, how from this we can extract the words, and not just a visual box… For a visual box, there are a few gradio apps here on the spaces links. However, all the examples draw boxes on the picture, and I want to extract the answer as text.

1 Like