I hope you are well. I am trying to train Kosmos model on DoclayNet data. I have prepared data. I am getting an error in Training.

My process for training is like this:

  1. Convert data into kosmos-2 format.
  2. Convert it to number using processor using below code:
inputs = processor(images = test2_df['image'].to_list(), text = test2_df['text'].to_list(), bboxes = test2_df['float_val'].to_list(),padding=True, return_tensors="pt").to(device)
dataset = Dataset.from_dict(inputs)
  1. Split dataset into train and test and using Trainer for train like this:
  1. Train model

I am getting error like this:

/usr/local/lib/python3.10/dist-packages/transformers/models/kosmos2/ in forward_embedding(self, input_ids, inputs_embeds, image_embeds, img_input_mask, past_key_values_length, position_ids)
   1150         print(image_embeds)
   1151         if image_embeds is not None:
-> 1152             inputs_embeds[] =
   1153                 -1, image_embeds.size(-1)
   1154             )

RuntimeError: Index put requires the source and destination dtypes match, got Float for the destination and Half for the source.

Can you please advice to resolve this error?


The problem was in my TrainingArguments(). I set fp16=True, which was messing up tensors dtypes. I removed it from the argument and it worked.

