Layoutlmv3 sequence_length vs token_sequnce_length size mismatch

Sajan · November 17, 2022, 9:08pm

I am trying to use Layoutlmv3Model for etracting features for every word/token, but sequecnce_length/outut shape is different than input. The output shape is (batch, sequence_length, embedding_dim), the sequence length is always 197 more than length of input ids (eg, if I keep max_length 100 in processor, my input_ids are of length 100 but the model output is (batch_len, 297, 768)). I am not sure what’s hapening and would like to know a way to map every token/subtoken to it’s extracted embedding. Are these the visual embeddings?

nielsr · November 18, 2022, 8:00am

Hi,

This is because LayoutLMv3 uses both image and text modalities as input. The 197 comes from the fact that there are 196 image patches + 1 for the special CLS token (as the image resolution is 224 and the patch resolution is 16 => (224/16)**2 = 196). So if you have input_ids of length 100, then the total number of tokens that are sent through the Transformer are 197 + 100 = 297.

Sajan · November 19, 2022, 5:40pm

Hi,
Thanks for clearing that up. I went through the code and there is a separate encoder block after concatenation of visual embedding and token embedding, so wanted to know if i simply do output_embedding[1:len(subword_tokens)+1] (assuming subword_tokens without cls/pad) to get embeddings for the subwords, those features now also contain features from all the modalities

Topic		Replies	Views
LayoutLMV3 embeddings Beginners	4	1103	August 3, 2022
Layoutlmv2 token classification on documents having tokens larger than 512 Models	8	2315	October 20, 2022
How to increase the max_seq_model LayoutLMV3 Models	0	459	December 9, 2022
LayoutLMV3 information extraction from invoice Awesome paper	2	994	September 22, 2024
LayoutLMv3 outputs multiple consecutive B- tokens within same word with transformers 28.1 vs dev Beginners	0	259	May 8, 2023

Layoutlmv3 sequence_length vs token_sequnce_length size mismatch

Related topics