Hello everyone, I am trying to finetune/create a Layoutlmv2 model for documents having tokens larger than 512. I have tried following but its not working:
Initializing Tokenizer and Layoutlmv2 from scratch:
That is how I am initializing the tokenizer and model. I have am training 50 data instances but training loss /epoch is clearly showing overfitting and loss is coming down as a very steep graph
I wanted to change the num_hidden_layers=24 and num_attention_heads=16 but on google colab it shows CUDA memory error.
I want to know if I am doing it right or i am missing something…? Before I move to sagemaker to train model with num_hidden_layers=24 and num_attention_heads=16 on a bigger GPU, I want to make sure I am doing it right. Looking forward to your helpful responses.
sorry for delayed reply @purnasai . I actually tried using custom tokenizer, custom processor and a new custom model using a new configuration with max sequence length= 1024. Model was able to detect more than 1024 tokens but with its internal architect of 12 hidden layers and 12 attention heads it was giving me a bad accuracy. As I had very less data to train I will not say that this solution wont work for others. If I try to change the internal architecture of model using the new config object (for example atten heads to 24) and hidden layers to (16) pytorch shows out of memory.
Current situation: I will try the same scenario with Layoutlmv3 and see if that works (90% chance it wont work).
@purnasai can u please tell me if you are initializing model from base-uncased for downstream training or from the scratch using custom configuration object.?
Thanks
Hi @navdeep, Using a Custom Tokenizer, Processor and Custom model would increase the complexity of the usecase. Again you are changing attentions heads and hidden layers. Having to learn the weights from the begining would also increase its computation time and goes outofmemory. Like you said, as you do not have much data to train, the above process is not a good approach, I would say.