How to manually add noise to embeddings for RoBERTa?

Hi Everybody!

I am trying to add some noise to the embeddings of my RoBERTa model and does not seem to succeed in doing so. I also have another interesting question which I have not been able to solve myself.

Why Noise?
I’m trying to explore how malicious noise can affect the model training and how we can design countermeasures, for example in a wireless system.

What did I do

  • I created a CustomRobertaModel
  • In the forward pass, I manually get the embedding weights and add some noise to it
  • Leave everything as is

This is my class:

class CustomRobertaModel(RobertaForSequenceClassification):
    Custom RoBERTa Model class that exposes embeddings, attentions, and head.
    def __init__(self, config):
        self.roberta_base = RobertaModel(config)
        self.head = self.classifier

    def forward(self, 

        with torch.no_grad():
            noise = torch.normal(2, 2, size=self.roberta_base.embeddings.word_embeddings.weight.size()).to(self.roberta_base.embeddings.word_embeddings.weight.device)
            noisy_weights = self.roberta_base.embeddings.word_embeddings.weight + noise

        # with torch.no_grad():
        # Getting encoder outputs
        encoder_outputs = self.roberta(

        with torch.no_grad():

        sequence_output = encoder_outputs[0]
        # Passing through the classifier (head)
        logits = self.head(sequence_output)
        # Compute loss if labels are provided
        if labels is not None:
            loss = torch.nn.CrossEntropyLoss()(logits, labels)
            return SequenceClassifierOutput(loss=loss, logits=logits)
            return logits

I initialize using roberta-base and train as in any other tutorial.

Outcomes and Observations

  • There is absolutely no change in the training and validation results in each epoch
  • Even setting the embedding weights to zero as seen in above uncommented part is not effective and does absolutely nothing
  • Interestingly enough, I rely on passing the input_ids instead of the embeddings to get the encoder outputs. As seen in the documentation, this is equal. However, I observe that (without any noise) my training and validation results are worst and I suspect the model is learning slower or not at all (whaat? and why?)


  • Is this the right way to do it? - I want to add noise for each forward pass as if there would be a jammer sitting and adding noise to it
  • What’s with the performance degradation when not using input_ids but inputs_embeds and why?

Thank you so much! Looking forward to more insights!


We now support adding noise to the embeddings with the NEFTune method: Trainer.

It’s also present in the SFTTrainer class of the TRL library (as adding noise to embeddings was shown to improve supervised fine-tuning or SFT of LLMs): Supervised Fine-tuning Trainer.

Thank you!!