How to extract text using LayoutLM2

IvoTavares · June 7, 2022, 12:41pm

I’m trying to use LayoutLMv2 to extract information from some invoices pictures.
So far, and based on what it’s here, I’ve run the following:

`from transformers import LayoutLMv2Processor, LayoutLMv2ForQuestionAnswering

from PIL import Image

import torch

processor = LayoutLMv2Processor.from_pretrained(“microsoft/layoutlmv2-base-uncased”)

model = LayoutLMv2ForQuestionAnswering.from_pretrained(“microsoft/layoutlmv2-base-uncased”)

image = Image.open(“name_of_your_document - can be a png file, pdf, etc.”).convert(“RGB”)

question = “what’s his name?”

encoding = processor(image, question, return_tensors=“pt”)

start_positions = torch.tensor([1])

end_positions = torch.tensor([3])

outputs = model(**encoding, start_positions=start_positions,
end_positions=end_positions)

loss = outputs.loss

start_scores = outputs.start_logits

end_scores = outputs.end_logits`

However, I’m not exactly sure, how from this we can extract the words, and not just a visual box… For a visual box, there are a few gradio apps here on the spaces links. However, all the examples draw boxes on the picture, and I want to extract the answer as text.

Topic		Replies	Views
Get the Q&A in LayoutLMv2 in text form Models	1	445	February 7, 2022
LayoutLMv3 Q/A Inference Beginners	2	2458	January 23, 2023
Need help in LayoutLM model Models	0	477	July 8, 2022
Can LayoutLM be used for images? Beginners	2	843	January 11, 2021
LayoutLMV3 information extraction from invoice Awesome paper	2	993	September 22, 2024

How to extract text using LayoutLM2

Related topics