Kosmos-2 Fine tuning

Hi @Mit1208 , thanks for the shared notebooks, I also working in Fine Tune with Kosmos2.
I want to clarify : Do we need to resize the image to 1025*1025 or it is flexible as long as all the images have the same size?
Thank you!

Hi, @ydshieh , thanks for your response. I still got a further question based on this code example.

I am working on finetune KOSMOS-2 to predict multiple continuous variables.
For example:
the INPUT of the model includes “image”+“instruction”,
the OUTPUT should be the 4 coutinuous variable, named as “prediction”, ranging of [0, 100](these numbers’ relative tokens are included in the tokenizer’s vocabulary naturally).
I would like to finetune KOSMOS-2 to output the “prediction”, according to the input of : “image”+“instruction”.

My question is: How to set the inputs[input_ids] and the inputs[labels]?
__
My current code is:
prompt = “instruction” +“/delimiter”+ “prediction” (“/delimiter” is a special delimiter token added to the tokenier vocabulary)
images = “image”
inputs = processor(text=prompt, images=image)

labels = inputs[‘input_ids’].clone()
labels[inputs[‘input_ids’] == 1] = -100
inputs[‘labels’] = labels

inputs[‘inputs_ids’] =inputs [inputs_ids[:/delimiter_token_index ]] + [1] * (len(input_ids)-/delimiter_token_index )
__
Am I setting the right inputs[‘inputs_ids’] and inputs[‘labels’]?

Thanks for your reading!