Adding dropout in custom model, but setting dropout through .from_pretrained()

Hello, I need to create a custom model for my research using the HuggingFace PreTrainedModel. I was wondering what would happen when I put my custom dropout into init, but when calling the model using .from_pretrained() or using model config, I change the hidden_dropout_prob and attention_probs_dropout_prob, to show what I mean I will put a little of my code here.

This is my model, where I assign self.dropout 0.5:

class RelationExtractionModel(PreTrainedModel):
    config_class = AutoConfig

    def __init__(self, model_config: AutoConfig, tokenizer: AutoTokenizer):
        super().__init__(model_config)
        self.model: AutoModel = AutoModel.from_pretrained(config.MODEL, config=model_config)
        self.model.resize_token_embeddings(len(tokenizer))
        self.tokenizer = tokenizer

        # HERE
        self.dropout = nn.Dropout(config.DROPOUT)
        #
        self.classifier = nn.Linear(model_config.hidden_size * 3, model_config.num_labels)

        self.e1_start_id = tokenizer.convert_tokens_to_ids(consts.E1_START_TOKEN)
        self.e2_start_id = tokenizer.convert_tokens_to_ids(consts.E2_START_TOKEN)
        self.cls_token_id = tokenizer.cls_token_id

    def forward(self, input_ids, attention_mask, labels=None, token_type_ids=None):
        outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
        sequence_output = outputs.last_hidden_state

     
        e1_mask = (input_ids == self.e1_start_id).unsqueeze(-1).expand(sequence_output.size())
        entity_a = torch.sum(sequence_output * e1_mask, dim=1)

        e2_mask = (input_ids == self.e2_start_id).unsqueeze(-1).expand(sequence_output.size())
        entity_b = torch.sum(sequence_output * e2_mask, dim=1)

        cls_mask = (input_ids == self.cls_token_id).unsqueeze(-1).expand(sequence_output.size())
        cls_embedding = torch.sum(sequence_output * cls_mask, dim=1)

        embedding = torch.cat([entity_a, entity_b, cls_embedding], dim=1)
        embedding = self.dropout(embedding)

        logits = self.classifier(embedding)

        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits, labels)

        return {"loss": loss, "logits": logits} if labels is not None else {"logits": logits}

and call the model like this:

from utils.RE_utils.CERED.RE_model import RelationExtractionModel
model = RelationExtractionModel.from_pretrained(config.MODEL, tokenizer=tokenizer,
                                                num_labels=len(id2label), 
                                                label2id=label2id, id2label=id2label,
                                                hidden_dropout_prob=0.25,
                                                attention_probs_dropout_prob=0.25)

where I put different values on purpose to show what I mean better.
My idea is, that the dropout on the hidden layers and for the attention probabilities will change to my assigned dropout in init, but I am not sure.

1 Like

OK, I don’t really understand. It seems to be correct, but when I asked Hugging Chat, it pointed out a few possible problems.


When working with Hugging Face’s PreTrainedModel and custom dropout layers, it’s important to understand how dropout probabilities are applied during model initialization and fine-tuning. Here’s a breakdown of the situation you described and potential implications:


Key Points to Consider

  1. Custom Dropout Layer Initialization
    In your RelationExtractionModel class, you explicitly define a custom dropout layer with a fixed dropout probability of 0.5:

    self.dropout = nn.Dropout(config.DROPOUT)
    

    This dropout layer is applied to the concatenated embeddings before the final classification step. This is a separate dropout layer from the dropout layers defined in the base transformer model (e.g., hidden_dropout_prob and attention_probs_dropout_prob).

  2. Setting Dropout Probabilities via from_pretrained()
    When you call RelationExtractionModel.from_pretrained(), you are passing custom dropout probabilities (hidden_dropout_prob=0.25, attention_probs_dropout_prob=0.25) to the model. These values:

    • Modify the dropout probabilities in the transformer model’s configuration.
    • Update the dropout layers within the transformer model (e.g., dropout after attention layers and hidden layers).
  3. Coexistence of Custom Dropout and Transformer Dropout
    The transformer model’s dropout layers (with the new probabilities) and your custom dropout layer (with a fixed 0.5 probability) will both be active during training. This means:

    • The transformer model will apply dropout to its internal computations (e.g., attention and hidden states).
    • Your custom dropout will be applied to the concatenated embeddings before classification.
  4. Potential Issues

    • Over-Dropout: Applying multiple dropout layers (transformer dropout and custom dropout) could lead to excessive dropout, reducing the model’s performance. Be cautious with the total dropout rate.
    • Inconsistent Dropout During Inference: Ensure that dropout is correctly handled during inference by setting model.eval() to disable dropout.

Clarifications

  • Transformer Dropout (hidden_dropout_prob and attention_probs_dropout_prob):
    These dropout probabilities affect the transformer model’s internal dropout layers. They are updated when you call from_pretrained() with the new probabilities.

  • Custom Dropout Layer:
    Your custom dropout layer, defined in the __init__ method, is independent of the transformer’s dropout layers. It will retain its dropout probability of 0.5, regardless of the values passed to from_pretrained().


Recommendations

  1. Adjust Custom Dropout Probability:
    Since the transformer model’s dropout has been reduced to 0.25, you may want to adjust the custom dropout layer to a lower value (e.g., 0.2) to avoid over-dropping.

  2. Monitor Model Behavior:
    Experiment with different dropout combinations and monitor the model’s performance during training and validation to ensure that it generalizes well.

  3. Documentation:
    Refer to Hugging Face’s official documentation for fine-tuning models and customizing architectures [here][1].

  4. Seed for Reproducibility:
    Ensure consistent results by setting a random seed when experimenting with different dropout values.


Example of Adjusted Custom Dropout

If you decide to adjust the custom dropout probability, update the __init__ method in your RelationExtractionModel class:

self.dropout = nn.Dropout(0.2)  # Reduced from 0.5

By carefully managing dropout rates, you can balance regularization and model performance in your custom architecture.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.