How to manually add noise to embeddings for RoBERTa?

Hi Everybody!

I am trying to add some noise to the embeddings of my RoBERTa model and does not seem to succeed in doing so. I also have another interesting question which I have not been able to solve myself.

Why Noise?
I’m trying to explore how malicious noise can affect the model training and how we can design countermeasures, for example in a wireless system.

What did I do

  • I created a CustomRobertaModel
  • In the forward pass, I manually get the embedding weights and add some noise to it
  • Leave everything as is

This is my class:

class CustomRobertaModel(RobertaForSequenceClassification):
    """
    Custom RoBERTa Model class that exposes embeddings, attentions, and head.
    """
    def __init__(self, config):
        super().__init__(config)
        self.roberta_base = RobertaModel(config)
        self.head = self.classifier

    def forward(self, 
                input_ids, 
                attention_mask=None, 
                token_type_ids=None, 
                position_ids=None, 
                head_mask=None, 
                labels=None):

        with torch.no_grad():
            noise = torch.normal(2, 2, size=self.roberta_base.embeddings.word_embeddings.weight.size()).to(self.roberta_base.embeddings.word_embeddings.weight.device)
            noisy_weights = self.roberta_base.embeddings.word_embeddings.weight + noise
            self.roberta_base.embeddings.word_embeddings.weight.data = noisy_weights.data

        # with torch.no_grad():
        #     self.roberta_base.embeddings.word_embeddings.weight.data.zero_()
        
        # Getting encoder outputs
        encoder_outputs = self.roberta(
            input_ids=input_ids,  
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask
        )

        with torch.no_grad():
            self.roberta_base.embeddings.word_embeddings.weight.data.zero_()

        sequence_output = encoder_outputs[0]
        
        # Passing through the classifier (head)
        logits = self.head(sequence_output)
        
        # Compute loss if labels are provided
        if labels is not None:
            loss = torch.nn.CrossEntropyLoss()(logits, labels)
            return SequenceClassifierOutput(loss=loss, logits=logits)
        else:
            return logits

I initialize using roberta-base and train as in any other tutorial.

Outcomes and Observations

  • There is absolutely no change in the training and validation results in each epoch
  • Even setting the embedding weights to zero as seen in above uncommented part is not effective and does absolutely nothing
  • Interestingly enough, I rely on passing the input_ids instead of the embeddings to get the encoder outputs. As seen in the documentation, this is equal. However, I observe that (without any noise) my training and validation results are worst and I suspect the model is learning slower or not at all (whaat? and why?)

Questions

  • Is this the right way to do it? - I want to add noise for each forward pass as if there would be a jammer sitting and adding noise to it
  • What’s with the performance degradation when not using input_ids but inputs_embeds and why?

Thank you so much! Looking forward to more insights!

EDIT:

I have looked into the roberta Implementation and basically copied the steps in how to compute the embeddings before passing them inside the encoder. However, even though I have copied the exact steps, the embeddings are still different.

Here’s the updated code :

class CustomRobertaModel(RobertaForSequenceClassification):
    """
    Custom RoBERTa Model class that exposes embeddings, attentions, and head.
    """
    def __init__(self, config):
        super().__init__(config)
        self.roberta_base = RobertaModel(config)
        self.head = self.classifier
    def forward(self, 
                input_ids, 
                attention_mask=None, 
                token_type_ids=None, 
                position_ids=None, 
                head_mask=None, 
                labels=None,
                inputs_embeds=None,
                past_key_values_length=0):
        # If no input embeddings are provided, compute them
        if inputs_embeds is None:
            inputs_embeds = self.roberta_base.embeddings.word_embeddings(input_ids)
        # Token Type Embeddings
        if token_type_ids is None:
            if hasattr(self.roberta_base.embeddings, "token_type_ids"):
                buffered_token_type_ids = self.roberta_base.embeddings.token_type_ids[:, :input_ids.size(1)]
                buffered_token_type_ids_expanded = buffered_token_type_ids.expand(input_ids.size(0), input_ids.size(1))
                token_type_ids = buffered_token_type_ids_expanded
            else:
                token_type_ids = torch.zeros(input_ids.size(), dtype=torch.long, device=self.roberta_base.embeddings.position_ids.device)
        token_type_embeddings = self.roberta_base.embeddings.token_type_embeddings(token_type_ids)
        
        # Positional Embeddings
        if position_ids is None:
            if input_ids is not None:
                position_ids = create_position_ids_from_input_ids(input_ids, self.roberta_base.embeddings.padding_idx, past_key_values_length)
            else:
                position_ids = self.roberta_base.embeddings.create_position_ids_from_inputs_embeds(inputs_embeds)
        position_embeddings = self.roberta_base.embeddings.position_embeddings(position_ids)
        # Combine the embeddings
        embeddings = inputs_embeds + token_type_embeddings
        if self.roberta_base.embeddings.position_embedding_type == "absolute":
            embeddings += position_embeddings
            
        # Layer Normalization and Dropout
        embeddings = self.roberta_base.embeddings.LayerNorm(embeddings)
        embeddings = self.roberta_base.embeddings.dropout(embeddings)
        # Getting encoder outputs
        encoder_outputs = self.roberta_base(
            inputs_embeds=embeddings,  
            attention_mask=attention_mask,
            position_ids=position_ids,
            token_type_ids=token_type_ids,
            head_mask=head_mask
        )
        sequence_output = encoder_outputs[0]
        
        # Passing through the classifier (head)
        logits = self.head(sequence_output)
        
        # Compute loss if labels are provided
        if labels is not None:
            loss = torch.nn.CrossEntropyLoss()(logits, labels)
            return SequenceClassifierOutput(loss=loss, logits=logits)
        else:
            return logits

This yields these embeddings : tensor([[-0.0939, -0.3949]], grad_fn=<AddmmBackward0>)
though it should be : tensor([[-0.0952, -0.3919]], grad_fn=<AddmmBackward0>)

The difference might not seem much, but my model performance is worse. Also, there should be a way to exactly replicate the results!