Hello, I have been attempting to train a HuggingFace object detection model from a pretrained backbone with autotrain-advanced.
To do so, I wanted to use COCO dataset, as most models are pretrained on it. The issue I bumped into is that COCO dataset has non-consecutive categories. I, then, formatted the annotations to correspond to metadata.jsonl
files, ensuring consecutive category ids.
Is there a method to train an object detection model like “facebook/detr-resnet-50” from resnet pretraining on COCO dataset?
Why am I using autotrain-advanced for training from scratch? I wanted to test autotrain for that purpose.
What have I tried so far?
Firstly, I attempted to use autotrain as-is, without modifying its source code. Since DETR is pretrained on COCO, it is essentially a “fine-tuning” of the same data with different ids, which isn’t what I’m aiming for. Moreover, during the training, the losses would decrease for a few epochs before increasing and then re-decreasing really slowly.
I also tried to fill all parameters with 0. but without success. There were no predictions made (MAP’s values are 0) for 60 epochs, and the losses did not decrease after that.
Finally, I reassigned the backbone values to the weights of Torchvision’s resnet50 model, initialized other parameters with torch.nn.init.xavier_normal_ when possible, and filled the rest (1 dimension parameters) with 0. or 1. like in DETR paper and code. I’m currently testing this approach, but even if it works, I don’t believe it’s the most efficient way to do it.
Thanks in advance for any help or suggestions.