Detr vs TableTransformer models have same layers but detr doesn't learn

I tried loading weights of TableTransformerForObjectDetection in Detr model and training it but that does not learn anything (Ran for 10 epochs, 50000 samples, f1 score remains 0), but the model learns fine if I use TableTransformerForObjectDetection class to load the weights (same training configurations as DETR).

AFAIK, the table transformer model is a DETR model only and checking the weights, the only difference is a layernorm in model encoder {'model.encoder.layernorm.bias', 'model.encoder.layernorm.weight'}. Does this minor difference warrant such drastic difference?

The model I am using is microsoft/table-transformer-structure-recognition