Hi @Mit1208 , thanks for the shared notebooks, I also working in Fine Tune with Kosmos2.
I want to clarify : Do we need to resize the image to 1025*1025 or it is flexible as long as all the images have the same size?
Thank you!
Hi, @ydshieh , thanks for your response. I still got a further question based on this code example.
I am working on finetune KOSMOS-2 to predict multiple continuous variables.
For example:
the INPUT of the model includes âimageâ+âinstructionâ,
the OUTPUT should be the 4 coutinuous variable, named as âpredictionâ, ranging of [0, 100](these numbersâ relative tokens are included in the tokenizerâs vocabulary naturally).
I would like to finetune KOSMOS-2 to output the âpredictionâ, according to the input of : âimageâ+âinstructionâ.
My question is: How to set the inputs[input_ids] and the inputs[labels]?
__
My current code is:
prompt = âinstructionâ +â/delimiterâ+ âpredictionâ (â/delimiterâ is a special delimiter token added to the tokenier vocabulary)
images = âimageâ
inputs = processor(text=prompt, images=image)
labels = inputs[âinput_idsâ].clone()
labels[inputs[âinput_idsâ] == 1] = -100
inputs[âlabelsâ] = labels
inputs[âinputs_idsâ] =inputs [inputs_ids[:/delimiter_token_index ]] + [1] * (len(input_ids)-/delimiter_token_index )
__
Am I setting the right inputs[âinputs_idsâ] and inputs[âlabelsâ]?
Thanks for your reading!