Layoutlmv2 token classification on documents having tokens larger than 512

navdeep · September 10, 2022, 6:09am

Hello everyone, I am trying to finetune/create a Layoutlmv2 model for documents having tokens larger than 512. I have tried following but its not working:

Initializing Tokenizer and Layoutlmv2 from scratch:

That is how I am initializing the tokenizer and model. I have am training 50 data instances but training loss /epoch is clearly showing overfitting and loss is coming down as a very steep graph

I wanted to change the num_hidden_layers=24 and num_attention_heads=16 but on google colab it shows CUDA memory error.

I want to know if I am doing it right or i am missing something…? Before I move to sagemaker to train model with num_hidden_layers=24 and num_attention_heads=16 on a bigger GPU, I want to make sure I am doing it right. Looking forward to your helpful responses.

purnasai · September 29, 2022, 7:07am

Hi @navdeep were you able to find the solution. What is the workaround you followed to solve the above one.

I have tried LMV3 and Changed the model embedding layer from 512 to 1024. Able to train. but not able to load at inference time.

navdeep · October 3, 2022, 9:42pm

Hi @purnasai I am still implementing the solution on the basis of this issue more than 512 tokens · Issue #23 · NielsRogge/Transformers-Tutorials (github.com). although this issue resolves around just the text but i am trying to extend to the images. I havent tried the LMV3 but will try and keep this thread posted. Thanks

purnasai · October 4, 2022, 3:58am

Hi @navdeep, I can think of 4 workarounds here.

Replacing existing tokenizer with other tokenizer that can handle tokens 1024,2048.
Using Stride option in processor, using collate, to input data with length > 512.
Replace 512 with 1024 in model architecture.
Crop the image to capture only 512 tokens.(crop the image to 2parts).

I Have the workaround1 here: LayoutLMV3 Training with Morethan 512 tokens. · Issue #19190 · huggingface/transformers · GitHub

navdeep · October 15, 2022, 5:41am

sorry for delayed reply @purnasai . I actually tried using custom tokenizer, custom processor and a new custom model using a new configuration with max sequence length= 1024. Model was able to detect more than 1024 tokens but with its internal architect of 12 hidden layers and 12 attention heads it was giving me a bad accuracy. As I had very less data to train I will not say that this solution wont work for others. If I try to change the internal architecture of model using the new config object (for example atten heads to 24) and hidden layers to (16) pytorch shows out of memory.

Current situation: I will try the same scenario with Layoutlmv3 and see if that works (90% chance it wont work).

navdeep · October 15, 2022, 5:44am

@purnasai can u please tell me if you are initializing model from base-uncased for downstream training or from the scratch using custom configuration object.?
Thanks

purnasai · October 18, 2022, 8:04am

Hi @navdeep, Using a Custom Tokenizer, Processor and Custom model would increase the complexity of the usecase. Again you are changing attentions heads and hidden layers. Having to learn the weights from the begining would also increase its computation time and goes outofmemory. Like you said, as you do not have much data to train, the above process is not a good approach, I would say.

purnasai · October 18, 2022, 8:05am

Using the base-uncased, as I want to make use of the pretrained memory to downstream.

navdeep · October 20, 2022, 9:41pm

Thanks for info @purnasai. I have resolved it using return_overflowing_tokens param in the processor and writing custom get_item function.

Topic		Replies	Views
Optimal Approach for Fine-Tuning LayoutLMv3 for Token Classification with 80 Labels Models	3	31	May 26, 2025
Fine tuning LayoutLMv2 For Token Classification on CORD dataset Beginners	0	315	February 2, 2024
Layoutlmv3 sequence_length vs token_sequnce_length size mismatch Models	2	697	November 19, 2022
How to change the Text embedder(Layoutlmv2Tokenizer) in LayoutLMv2 model? 🤗Transformers	3	518	September 29, 2022
How to increase the max_seq_model LayoutLMV3 Models	0	459	December 9, 2022

Layoutlmv2 token classification on documents having tokens larger than 512

Related topics