RT-DETR attention map dimension - PekingU/rtdetr_r50vd

halyusuf · December 20, 2024, 11:11am

im using the below model:

pretrained_model = tr.RTDetrForObjectDetection.from_pretrained(
pretrained_model_name_or_path="PekingU/rtdetr_r50vd", 
output_attentions=True)

the decoder output attention map dimension is (batch_size,num_queries, num_heads, 3, 4), although the documentaiton is say that it should be (batch_size,num_queries, num_heads, 4, 4). now there is nothing in the documentation explaining what are these numbers exactly. i suspcted that the first 4 is for num_features and the second 4 is for the offset of the Deformable Attention.

when i change the configuration of the num_featuer to 4, during inference, i got the below error

TypeError: conv2d() received an invalid combination of arguments - got (list, Parameter, NoneType, tuple, tuple, tuple, int), but expected one of:

(Tensor input, Tensor weight, Tensor bias = None, tuple of ints stride = 1, tuple of ints padding = 0, tuple of ints dilation = 1, int groups = 1)
didn’t match because some of the arguments have invalid types: (list of [Tensor, Tensor, Tensor], Parameter, NoneType, tuple of (int, int), tuple of (int, int), tuple of (int, int), int)

(Tensor input, Tensor weight, Tensor bias = None, tuple of ints stride = 1, str padding = “valid”, tuple of ints dilation = 1, int groups = 1)
didn’t match because some of the arguments have invalid types: (list of [Tensor, Tensor, Tensor], Parameter, NoneType, tuple of (int, int), tuple of (int, int), tuple of (int, int), int)

so my question, do u know that are the first and second 4 in the output dimension? and why im getting 3,4 ? and how to fix it ?

Topic		Replies	Views
Error when trying to visualize attention in T5 model Beginners	4	1641	March 20, 2024
`target_sizes` and `output.logits` do not align in `image_processor.post_process_object_detection` 🤗Transformers	0	50	September 3, 2024
Possible fix for trainer evaluation with object detection 🤗Transformers	0	316	February 7, 2024
How to plot an attention map for Vision Transformer model Beginners	0	2083	April 12, 2024
What is the dimensionality of output_attentions? 🤗Transformers	0	464	July 9, 2022

RT-DETR attention map dimension - PekingU/rtdetr_r50vd

Related topics