Kosmos-2 Fine tuning

Mit1208 · March 16, 2024, 1:57am

I will add the below code to handle padding. Let me know if it’s a wrong way.

labels = inputs['input_ids'].clone()
labels[inputs['attention_mask'] == 0] = -100
inputs['labels'] = labels

ydshieh · March 18, 2024, 9:53am

I think the padding token id is 1 instead of 0. You can see that in Kosmos2TextConfig but as well as by checking the example you have in the notebook.

Other than this, I think it it!

Mit1208 · March 20, 2024, 11:30pm

I was using attention_mask so it was working with 0 but for input_ids it’s 1. I checked config file, padding token id is 1.

new code would look like this:

labels = inputs['input_ids'].clone()
labels[inputs['input_ids'] == 1] = -100
inputs['labels'] = labels

yuerlong · March 25, 2024, 5:22pm

@Mit1208 Did you successfully enable FP16 at your training notebook?

Mit1208 · March 25, 2024, 5:58pm

@yuerlong fp16=True in training argument was giving me an error so I removed that argument.

FYI - Issue on Kosmos-2 model training on new dataset

yuerlong · March 28, 2024, 6:03am

@Mit1208 I just changed add a line before 1152 at modeling_kosmos2.py to temporarily enable fp16

inputs_embeds = inputs_embeds.to(image_embeds.dtype)

cdh · April 2, 2024, 5:04pm

Hello everyone @Mit1208 and @ydshieh

I am still eager to see if we can adapt Kosmos2 for my task

Did you get any working examples yet? I see the Colab Notebook you shared, but I don’t know if it’s updated. Any news?

Thanks,
CDH

Mit1208 · April 2, 2024, 5:59pm

hi @cdh

I think I have everything and running fine thanks to @ydshieh. I couldn’t train model just because of GPU resource but I can share colab which has everything. I just need to adjust few things. I will try to share in few hours.

Mit1208 · April 2, 2024, 11:32pm

Hi @cdh

Here is my final code which shows kosmos-2 fine-tuning. because of the GPU limitation I updated the layers information. You can remove the Config parameter while loading the model and you are good to go.

My code:

Finetuning code

Stephencoder · April 3, 2024, 8:51am

Hi, thx a lot for the impressive work. I am also trying to finetune KOSMOS-2, and I check your colab note, getting one question is that the code is to train a KOSMOS-2 from a scratch with customized dataset, or finetune the model with LoRA or other fintune methods.

Thanks a lot

cdh · April 9, 2024, 10:10am

Nice, thank you very much.

Could you please briefly explain what was the problem and the solution? It’s hard to go through all your code without the knowledge you have now. In this way, I could understand how to apply that solution to my problem. Even pointers to the code saying “here I changed this into this” or “here I added this line”, etc.

Thanks again!

ydshieh · April 11, 2024, 1:28pm

For anyone other than @Mit1208 (as he knows what’s going on now), the following 2 replies are the 2 changes necessary

@cdh Unfortunately, I won’t have the bandwidth to dive into the notebook you provided - especially it contains a lot of custom code and customization.

Regarding the question about labels, see the above 2 links.

For general training with your custom dataset/model, I would recommend:

try a simple dataset (with just a few text/image paris), train the model (probably with the pretrained one, but you can of course also try from scratch) on that tiny dataset, and see if you can see the loss decreasing, get the model to give the desired generation (on the trained examples)
Always to to look the examples (before processing, and after being processed by the Kosmos2 processor) , make sure you understand the output of the processor (which are the inputs to the model)
Once you get familar with the above, think of what would/should be adjusted for your custom dataset and model

cdh · April 11, 2024, 1:53pm

Thank you, I already tried to implement this behavior in my code with this notation:

labels_ids = [[-100] * indexes[i] + labels_id[indexes[i]:] for i, labels_id in enumerate(labels_ids_tmp)]
prompt_ids = [prompt_id[:indexes[i]] + [1] * (len(input_ids[0])-indexes[i]) for i, prompt_id in enumerate(prompt_ids_tmp)]

As far as I know, this should be the same as your solution, am I right? But here I divided my text into prompt (the input) and label (the output)

ydshieh · April 11, 2024, 2:19pm

Could you remind me what the issue you have? Is it still about

ValueError: The model did not return a loss from the inputs, only the following keys: logits,past_key_values,image_embeds,projection_attentions,vision_model_output. For reference, the inputs it received are pixel_values,input_ids,attention_mask,image_embeds_position_mask.
  0%|          | 0/10 [00:03<?, ?it/s]

that you opened

github.com/huggingface/transformers

[Kosmos-2]

opened 10:13AM - 19 Jan 24 UTC

basteran

### System Info - `transformers` version: 4.36.2 - Platform: Linux-5.15.0-84-g…eneric-x86_64-with-glibc2.35 - Python version: 3.10.0 - Huggingface_hub version: 0.20.2 - Safetensors version: 0.4.1 - Accelerate version: 0.26.1 - Accelerate config: not found - PyTorch version (GPU?): 2.1.2+cu121 (True) - Tensorflow version (GPU?): not installed (NA) - Flax version (CPU?/GPU?/TPU?): not installed (NA) - Jax version: not installed - JaxLib version: not installed - Using GPU in script?: Yes - Using distributed or parallel set-up in script?: No ### Who can help? I think the person in charge of Kosmos-2 is @ydshieh ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below) ### Reproduction *This issue refers to another issue reported on [the official Kosmos repository](https://github.com/microsoft/unilm/issues/1429)!* Hello everyone, thank you very much for your contribution. I appreciate the effort and consistency in uploading the code for such many models and maintaining this repository. I saw Kosmos-2 and I quickly thought I could fine-tune it on my downstream task. But I couldn't find any example of how to do it. I see there is [on the official Kosmos repository](https://github.com/microsoft/unilm/tree/master/kosmos-2#training) a little "guide" for Training the model, but I don't know if they're referring to the Pre-training or further fine-tuning, I'm interested in the second one. So I tried to implement it myself using the `transformers` library, but I'm getting errors during the Fine-Tuning procedure. ```python model = AutoModelForVision2Seq.from_pretrained("microsoft/kosmos-2-patch14-224", device_map="auto") processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224", device_map="auto") # load dummy dataset from json file train_data = load_dataset("json", data_files=tmp_train_file_name) val_data = load_dataset("json", data_files=tmp_val_file_name) # process the inputs, i.e. images and texts def kosmos2_collate_fn(examples): images, texts = [], [] for example in examples: image = Image.open(example['image_path']) images.append(image) texts.append(example['input_text']) inputs = processor(text=texts, images=images, return_tensors="pt").to(model.device) return Dataset.from_dict(inputs) new_train_data = kosmos2_collate_fn(train_data) new_val_data = kosmos2_collate_fn(val_data) training_arguments = TrainingArguments( remove_unused_columns=False, per_device_train_batch_size=MICRO_BATCH_SIZE, gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS, warmup_ratio=0, num_train_epochs=EPOCHS, learning_rate=LEARNING_RATE, logging_strategy="steps", logging_steps=1, optim="adamw_torch", evaluation_strategy="epoch", save_strategy="epoch", output_dir=OUTPUT_DIR, save_total_limit=1, load_best_model_at_end=True, label_names=["labels"] ) trainer = Trainer( model=model, train_dataset=new_train_data, eval_dataset=new_val_data, args=training_arguments, ) trainer.train() ``` and the resulting errors: ```console Generating train split: 40 examples [00:00, 8627.15 examples/s] Generating train split: 6 examples [00:00, 2428.20 examples/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 0%| | 0/10 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/user/kosmos2/train.py", line 193, in <module> trainer.train() File "/home/user/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/home/user/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1854, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/user/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step loss = self.compute_loss(model, inputs) File "/home/user/.local/lib/python3.10/site-packages/transformers/trainer.py", line 2776, in compute_loss raise ValueError( ValueError: The model did not return a loss from the inputs, only the following keys: logits,past_key_values,image_embeds,projection_attentions,vision_model_output. For reference, the inputs it received are pixel_values,input_ids,attention_mask,image_embeds_position_mask. 0%| | 0/10 [00:03<?, ?it/s] ``` I can't figure out the issue. It says that the model did not return a loss, which means it didn't compute it. It looks like the `processor` did not return any `labels` and the `Trainer` could not compute the loss... ### Expected behavior I would expect to train the model on my data, i.e. to compute the loss, perform gradient updates, etc.

?

Mit1208 · April 11, 2024, 2:25pm

@cdh, I will add comments and reasoning in my code so it would be easier to follow (give me some time).
I made @ydshieh’s mentioned changes in the code so it should work for you.

cdh · April 11, 2024, 2:58pm

You’re right, sorry for not remembering to you my issue.

I managed to add the -100 ids to the input with the above code, but when I trained the model the output was not coherent. For a given <image, prompt> pair I get a response A, if I change the image maintaining the same prompt, the response is the same identical A as before with a completely wrong bounding box.
I am working with images from an automation house and I want to model to produce the bounding boxes (among other stuff) of objects from the image. If I change the image, the model should produce different bounding boxes, but it turns out that it produces the same response A where the bounding box points on the wall, and there’s nothing useful there.

My intuition here is that during training the model started at some point to disregard the images and focused on the text part only. So I was wondering if maybe my way of adding the -100 ids is the same as yours.

I will try your method and see if there’s any improvement, but it will take a while to train.

Thanks again.

Mit1208 · May 22, 2024, 11:25am

@ydshieh
I think we can close this topic. I have fine-tuned model and code is in the model card:

Kevin-Hoo · June 11, 2024, 10:05pm

Hi @ydshieh and @Mit1208 , thank you for the great work! I am following the LoRa example code provided by @Mit1208 in the previous thread, however, I get this error:

RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

And the trace can be traced back to line 1147 in ‘modeling_kosmos2.py’:

inputs_embeds[img_input_mask.to(dtype=torch.bool)] = image_embeds.to(inputs_embeds.device).view(
-1, image_embeds.size(-1)
)

Do you have any idea? Thanks!

Mit1208 · June 11, 2024, 11:24pm

As @yuerlong mentioned, Kosmos-2 Fine tuning - #26 by yuerlong - do this and re-start kernel if you are using google colab and it should work.

Kevin-Hoo · June 13, 2024, 10:55pm

Hi @Mit1208 and @ydshieh , by any chance, do you have idea why there isn’t log for the validation loss? Thank you!

Topic		Replies	Views
Issue on Kosmos-2 model training on new dataset 🤗Transformers	3	437	February 25, 2024
Issue with KOSMOS-2 encoding and decoding 🤗Tokenizers	11	469	January 26, 2024
ValueError: The model did not return a loss from the inputs, only the following keys: last_hidden_state, past_key_values. For reference, the inputs it received are input_ids, attention_mask Beginners	3	940	February 16, 2024
Issues Training BlipForImageTextRetrieval Beginners	0	112	June 7, 2024
I tired and can't solve this error , ValueError: The model did not return a loss from the inputs, only the following keys: logits. For reference, the inputs it received are input_ids,attention_mask Models	1	1154	March 29, 2023

Kosmos-2 Fine tuning

Related topics