Adding another head to Vision encoder decoder model

Edgar404 · May 1, 2024, 3:21pm

I want to add another head to a VisionEncoderDecoder Model namely the model Donut but when i do so the second doesn’t seem to learn anything (the loss is barely decreasing). Additionally i can’t use the generate function with two heads.
So, is there a tutorial or does someone have a notebook or a hint that can help me make such modifications ?

LeroyDyer · May 2, 2024, 4:05pm

when you create the model it has w=two components one is th eimage part and the other the LLM … so you can use the donut model for this part : then save the model as pretrained : (convert to fp16 first) then you can treat it like a fresh model and train it !::

I also did a simular project to this : it worked fine ! problem was that the GGUF could not be made !!! as it is not a compatible model : so it will need to run as weights!

Edgar404 · May 2, 2024, 4:43pm

Thank you for your answer !
So , You are advising me to save my model after adding the second head and reimport it and then to train it. Did I get it right?

LeroyDyer · May 7, 2024, 9:48am

yes as the memory in colab times you out and can disconnet the runtime ??

later i made a mistral with less layers (ie 1b but training was taking too long) … so : its probably right to add a fully trained model despite the size of the model LeroyDyer/Mixtral_AI_MiniTronVision this was a model i used the small brain on !
LeroyDyer/Mixtral_AI_MiniTronSpeech << the speech version

one is a vision encoderDecoder and the other a Speech Encoder Decoder

Vmodel = VisionEncoderDecoderModel.from_encoder_decoder_pretrained( "google/vit-base-patch16-224-in21k", "LeroyDyer/Mixtral_AI_Tiny" ) _Encoder_ImageProcessor = Vmodel.encoder _Decoder_ImageTokenizer = Vmodel.decoder _VisionEncoderDecoderModel = Vmodel



Add Pad tokems
LM_MODEL.VisionEncoderDecoder = _VisionEncoderDecoderModel

Add Sub Components
LM_MODEL.Encoder_ImageProcessor = _Encoder_ImageProcessor LM_MODEL.Decoder_ImageTokenizer = _Decoder_ImageTokenizer LM_MODEL

at that point is convert to fp16 and save to pretrained!

but really it should be one shot trained ! so you need to have a training ready for this model before saving if … you have the gpu(s) and Memeory
but when you do the instance of the model in the begining it takes upto 35 gig GPU! hence trail and error !
so after this you still need to have at least 5-10gb memeory to run a training ! as the models are in memeory ! << Issue!
so I offloaded the model to disk (pretrained and suffered the loss of first pass) … but after SFT training is still fine ! it just takes a litlle longer : hence tiny datsets to begin training random models! until over fit to the dataset : enabling for the first task to embedd into the model
on the next training you can begin hopefull converging much quicker !
models do not converge instantly they need many sample and many epochs: (best to use small dataset of 1k) and have a few different ones ready ! so once the first is overfit go to the second and you will se how far away it is!!!
so repeat untill in begins converging quicker then you can train the model properly (also keep messing wiht the lora config so you can get random parameters to tune !

system · June 4, 2024, 10:00am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Creating custom Donut model Models	0	725	March 16, 2023
Using EncoderDecoderModel 🤗Transformers	4	1088	October 28, 2021
How to implement custom vision encoder-decoder? 🤗Transformers	1	711	August 1, 2023
VisionEncoderDecoder/TrOCR Models	0	707	October 21, 2021
FlaxVisionEncoderDecoderModel decoder_start_token_id Beginners	1	454	January 13, 2022

Adding another head to Vision encoder decoder model

Related topics