Hi, I am using Layoutlmv3. I Have used LMV3 processor. At train time I have labels as well along with image,text,boxes and labels. But at Inference time, I do not have labels. So the code for inference goes like below.
processor = AutoProcessor.from_pretrained("microsfotlmv3_repo", apply_ocr=False)
encoding = processor(images = resize_image,
text = tokens,
boxes= boxes,
return_offsets_mapping=True,
return_tensors="pt",
padding = "max_length",
truncation = True,
max_length = 512
)
offset_mapping = encoding.pop('offset_mapping')
outputs = test_model1(**encoding)
predictions = outputs.logits.argmax(-1).squeeze().tolist()
is_subword = np.array(offset_mapping.squeeze().tolist())[:,0] != 0
true_predictions = [id2label[pred] for idx, pred in enumerate(predictions) if not is_subword[idx]]
Currently, I am decoding text like:
cleaned_input_ids = encoding['input_ids'][encoding['attention_mask']>0]
text = processor.tokenizer.decode(cleaned_input_ids.squeeze().tolist())
text = text[4:-4]
tokens = text.split(" ")
But the count of tokens and count of true_predictions are not matching.
I am expecting the result to be:
“sun”:label,
“rises”:label,
“in”:label,
“the”:label,
“east”:label.
Currently i am not able to map them, as their counts/lengths are not matching. How to resolve this.
Tagging @nielsr, and Others