Clarification on Classification Token

saeed11b95 · August 27, 2024, 3:12am

huggingface/transformers/blob/0a7af19f4dc868bafc82f35eb7e8d13bac87a594/src/transformers/models/layoutlmv3/modeling_layoutlmv3.py#L1349


      
              position_ids=position_ids,
              head_mask=head_mask,
              inputs_embeds=inputs_embeds,
              output_attentions=output_attentions,
              output_hidden_states=output_hidden_states,
              return_dict=return_dict,
              bbox=bbox,
              pixel_values=pixel_values,
          )
          
          sequence_output = outputs[0][:, 0, :]
          logits = self.classifier(sequence_output)
          
          loss = None
          if labels is not None:
              if self.config.problem_type is None:
                  if self.num_labels == 1:
                      self.config.problem_type = "regression"
                  elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
                      self.config.problem_type = "single_label_classification"
                  else:

@nielsr
In the above line the first token from the output of transformer is used to do classification logit computation. I am confused because in the source code the learnable classification token is not at the zeroth index. The cls_token is concatenated at the start of image patch tokens
visual_tokens = torch.cat([cls_token, visual_path_embeddings])
and to create transformer input the text+bbox inuputs and image tokens are concatenated as follows,
transformer_inp = torch.cat([text_embeddings, visual_embedding])
this means that my classification is at index 512 considering that we put limit of 512 tokens on text inputs.
This is just for clarification, using first token for classification also does a fine job.
Thanks

Topic		Replies	Views
What is the index of the class token feature vector? 🤗Transformers	0	82	March 21, 2024
Token classification for a non-textual data 🤗Transformers	0	438	March 5, 2023
What is the classification head doing exactly? 🤗Transformers	16	24596	November 4, 2024
Is zeroshot classification tokenizing the input sequence more than once? 🤗Transformers	0	211	April 5, 2022
LayoutLMV3 embeddings Beginners	4	1110	August 3, 2022

Clarification on Classification Token

Related topics