I am trying to fine-tune a mask2former model for a binary task where 0 is the background and 1 is my object. I am initializing the processor and model in the following way:
My dataset randomly rotates and flips my training and validation data and it also takes a RandomCrop of the image and masks. This setup has already worked for training a LRASPP-model from the ground up.
However every time I try to train the mask2former-model after a few epochs the accuracy converges to 0 and it starts outputting only null-tensors.
Is something with my initialization wrong or not? I have already looked for different threads about this but there aren’t many. On Huggingface there is one from a year ago but this was never answered.
I will try if implementing something that guarantees that the input will never be a null-tensor is going to improve it but at this point I am not very hopeful.
I have tried using do_reduce_labels at one point but that lead to it always returning a null-tensor.
I set ignore_index = 0 because my background is labeled as 0 and the object I want to detect is set to 1. When I read them in the background is 0 and the object is 255 but I am dividing the mask-tensor by 255 to get it to 0 and 1 because I had trouble with other models before without doing this.
I will try your suggestion though and post my answer.
I have set it up so that the background is 255 and the object is 0. ignore_index is set to 255 and reduce_labels is set to False. I even implemented it so that every randomcrop has at least several thousand pixels of the object inside it to prevent it from learning empty tensors and it is still not working.
Hi @beschmitt
The above example I provided seems to work without an error, however, you can avoid using an image processor if it doesn’t work for you and just prepare model input by yourself, just make sure it is in the same format model expected.
First, you can take a look at the officially provided example (see the links above), then prepare your input to be in the same format.