Info about insertion of "distillation_token" into the audio spectrogram transformer class

hisoka94 · October 4, 2023, 5:05pm

Hi! I’d like to know why the AST (audio spectrogram transformer) model inserts a distillation token in front of the audio flattened patch embeddings, other than the standard [CLS] token. I was wondering why this distillation token is inserted, what is its role considering that in the original AST model this token is not used.
I include the script I’m referring to:

github.com

huggingface/transformers/blob/v4.31.0/src/transformers/models/audio_spectrogram_transformer/modeling_audio_spectrogram_transformer.py#L54


      
          _SEQ_CLASS_EXPECTED_OUTPUT = "'Speech'"
          _SEQ_CLASS_EXPECTED_LOSS = 0.17
          
          
          AUDIO_SPECTROGRAM_TRANSFORMER_PRETRAINED_MODEL_ARCHIVE_LIST = [
              "MIT/ast-finetuned-audioset-10-10-0.4593",
              # See all Audio Spectrogram Transformer models at https://huggingface.co/models?filter=ast
          ]
          
          
          class ASTEmbeddings(nn.Module):
              """
              Construct the CLS token, position and patch embeddings.
              """
          
              def __init__(self, config: ASTConfig) -> None:
                  super().__init__()
          
                  self.cls_token = nn.Parameter(torch.zeros(1, 1, config.hidden_size))
                  self.distillation_token = nn.Parameter(torch.zeros(1, 1, config.hidden_size))
                  self.patch_embeddings = ASTPatchEmbeddings(config)

At line 54 there’s the definition of the ASTEmbeddings class where the [CLS] and distillation tokens are created and then used in the forward pass.

Thank you for your help and attention!

Topic		Replies	Views
DistilBERT and CLS token Beginners	2	2444	February 21, 2021
0% accuracy when finetuning from certain models. [CLS] token embeddings not learned 🤗Transformers	1	608	November 2, 2023
Importing a DistilBertTokenizer does not work using AutoTokenizer Beginners	0	650	November 8, 2023
Adding a new mask_token for BERT-like models/tokenizers Intermediate	0	543	May 26, 2023
Add_tokens + finetune 🤗Transformers	0	521	February 25, 2022

Info about insertion of "distillation_token" into the audio spectrogram transformer class

Related topics