Kosmos-2 Fine tuning

sutantowilliam · July 6, 2024, 3:06pm

Hi @Mit1208 , thanks for the shared notebooks, I also working in Fine Tune with Kosmos2.
I want to clarify : Do we need to resize the image to 1025*1025 or it is flexible as long as all the images have the same size?
Thank you!

Stephencoder · August 19, 2024, 12:50pm

Hi, @ydshieh , thanks for your response. I still got a further question based on this code example.

I am working on finetune KOSMOS-2 to predict multiple continuous variables.
For example:
the INPUT of the model includes “image”+“instruction”,
the OUTPUT should be the 4 coutinuous variable, named as “prediction”, ranging of [0, 100](these numbers’ relative tokens are included in the tokenizer’s vocabulary naturally).
I would like to finetune KOSMOS-2 to output the “prediction”, according to the input of : “image”+“instruction”.

My question is: How to set the inputs[input_ids] and the inputs[labels]?
__
My current code is:
prompt = “instruction” +“/delimiter”+ “prediction” (“/delimiter” is a special delimiter token added to the tokenier vocabulary)
images = “image”
inputs = processor(text=prompt, images=image)

labels = inputs[‘input_ids’].clone()
labels[inputs[‘input_ids’] == 1] = -100
inputs[‘labels’] = labels

inputs[‘inputs_ids’] =inputs [inputs_ids[:/delimiter_token_index ]] + [1] * (len(input_ids)-/delimiter_token_index )
__
Am I setting the right inputs[‘inputs_ids’] and inputs[‘labels’]?

Thanks for your reading!

Topic		Replies	Views
Issue on Kosmos-2 model training on new dataset 🤗Transformers	3	435	February 25, 2024
Issue with KOSMOS-2 encoding and decoding 🤗Tokenizers	11	468	January 26, 2024
ValueError: The model did not return a loss from the inputs, only the following keys: last_hidden_state, past_key_values. For reference, the inputs it received are input_ids, attention_mask Beginners	3	928	February 16, 2024
Issues Training BlipForImageTextRetrieval Beginners	0	110	June 7, 2024
I tired and can't solve this error , ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,attention_mask Models	1	1147	March 29, 2023

Kosmos-2 Fine tuning

Related topics