How to edit different classes in transformers and have the transformer installed with the changes?

I wanted to edit some classes in the transformers for example BertEmbeddings https://github.com/huggingface/transformers/blob/4c32f9f26e6a84f0d9843fec8757e6ce640bb44e/src/transformers/models/bert/modeling_bert.py#L166
and pre-train Bert from scratch on a custom dataset. But I am stuck on how to make the edit work.

The process I am following is:

  • Clone the repository git clone https://github.com/huggingface/transformers.git
  • edit the classes I needed to edit
  • Install transformer using
cd transformers
pip install -e .

The problem is I can not load any model or tokenizer using commands like:

from transformers import BertModel

The error message shows:

ImportError: cannot import name ‘BertModel’ from ‘transformers’ (unknown location)

while import transformers works perfectly fine.

My questions are:

  • How do I import the BertTokenizer or BertModel
  • Is there a better way to achieve what I am trying to than my approach?

I could be way off so any helpful suggestion is appreciated. Thanks

Note: I am trying to do something like this. How to use additional input features for NER? - #2 by nielsr

To clarify the problem:
Usually, when we want to use the transformer in it’s original form we do:

!pip install transformers
import transformers
from transformers import BertTokenizer, BertModel

If I want to change a class or two, what changes should I make to the above code snippet to import the BertTokenizer and BertModel. I am assuming cloning the repository and making the edits in the desired file is fine.

Apparently, the problem was the editable version

This works:

cd transformers
pip install .

editable was not what I thought at first.

2 Likes

Hello @bengul
Is there any other way to achieve the same result without cloning the huggingface repo and making changes in the codes? It would be better in terms of ease of reproducibility if say I create a class that inherits the original class I want to make changes in (in your case BertEmbeddings) and then somehow make the from_pretrained class go through my created subclass rather than the original.

Initially, my thought was to do exactly what you proposed, but I was not sure how to achieve that. I am sure it is possible and probably a better approach.

1 Like