How to fine tune bert on entity recognition?

  • I have a paragraph for example below
    Either party may terminate this Agreement by written notice at any time if the other party defaults in the performance of its material obligations hereunder. In the event of such default, the party declaring the default shall provide the defaulting party with written notice setting forth the nature of the default, and the defaulting party shall have thirty (30) days to cure the default. If after such 30 day period the default remains uncured, the aggrieved party may terminate this Agreement by written notice to the defaulting party, which notice shall be effective upon receipt.

and then I need the Entity label and Entity value

Entity value = thirty (30) days
Entity label = Termination Notice Period

and I want to frame it as a Entity recognition task, So could you please tell me how would you have been approached?

Named-entity recognition (NER) is typically solved as a sequence tagging task, i.e. the model is trained to predict a label for every word. Typically one annotates NER datasets using the IOB annotation format (or one of its variants, like BIOES). Let’s take the example sentence from your paragraph. It would have to be annotated as follows:

the O
defaulting O
party O
shall O
have O
thirty B-TER
(30) I-TER
days I-TER
to O
cure O 
the O
default O
. O

In other words, we annotate each word as being either outside a named entity (“O”), inside a named-entity (“I-TER”) or at the beginning of a named entity (“B-TER”).

However, there’s one additional challenge, in the sense that models like BERT operate on subword tokens, rather than words, meaning that a word like “hello” might be tokenized into [“hel”, “lo”]. This means that one should actually labels all tokens rather than all words, as BERT will be trained to predict a label for every token. There are multiple strategies here, one could either propagate the label to all subtokens of a word, or only label the first subword token of a given word.

You can take a look at my example notebooks that illustrate how to fine-tune BERT for NER.

I didn’t understand this, plz can you explain with an example as well about strategies.

Suppose that I would like to label “Niels” as person, and that the original IOB annotation looked as follows:

Niels B-PER

When we tokenize “Niels” using BertTokenizer, we get:

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

text = "Niels"
input_ids = tokenizer(text).input_ids
for id in input_ids: print(id, tokenizer.decode([id]))

This prints:

101 [CLS]
9152 ni
9050 ##els
102 [SEP]

As you can see, the word “Niels” has been tokenized into 2 tokens, namely “ni” and “##els”. The [CLS] and [SEP] tokens are special tokens which BERT uses by default - let’s ignore those for now. Suppose that the label index for B-PER is 1.

So now you have a choice: either you label both “ni” and “##els” with label index 1, either you only label the first subword token “ni” with 1 and the second one with -100. The latter assures that no loss will be taken into account for the second subword token.

Thanks, I understood, I am running your code on my colab when I run below code:-

def train(epoch):

    tr_loss, tr_accuracy = 0, 0

    nb_tr_examples, nb_tr_steps = 0, 0

    tr_preds, tr_labels = [], []

    # put model in training mode

    model.train()

    

    for idx, batch in enumerate(training_loader):

        

        ids = batch['ids']

        mask = batch['mask']

        targets = batch['targets']

        loss, tr_logits = model(input_ids=ids, attention_mask=mask, labels=targets)

        tr_loss += loss.items()

        nb_tr_steps += 1

        nb_tr_examples += targets.size(0)

        

        if idx % 100==0:

            loss_step = tr_loss/nb_tr_steps

            print(f"Training loss per 100 training steps: {loss_step}")

           

        # compute training accuracy

        flattened_targets = targets.view(-1) # shape (batch_size * seq_len,)

        active_logits = tr_logits.view(-1, model.num_labels) # shape (batch_size * seq_len, num_labels)

        flattened_predictions = torch.argmax(active_logits, axis=1) # shape (batch_size * seq_len,)

        # now, use mask to determine where we should compare predictions with targets (includes [CLS] and [SEP] token predictions)

        active_accuracy = mask.view(-1) == 1 # active accuracy is also of shape (batch_size * seq_len,)

        targets = torch.masked_select(flattened_targets, active_accuracy)

        predictions = torch.masked_select(flattened_predictions, active_accuracy)

        

        tr_preds.extend(predictions)

        tr_labels.extend(targets)

        

        tmp_tr_accuracy = accuracy_score(targets.cpu().numpy(), predictions.cpu().numpy())

        tr_accuracy += tmp_tr_accuracy

    

        # gradient clipping

        torch.nn.utils.clip_grad_norm_(

            parameters=model.parameters(), max_norm=MAX_GRAD_NORM

        )

       
        # backward pass

        optimizer.zero_grad()

        loss.backward()

        optimizer.step()

    epoch_loss = tr_loss / nb_tr_steps

    tr_accuracy = tr_accuracy / nb_tr_steps

    print(f"Training loss epoch: {epoch_loss}")

    print(f"Training accuracy epoch: {tr_accuracy}")

I am getting the below error,

     14         loss, tr_logits = model(input_ids=ids, attention_mask=mask, labels=targets)
---> 15         tr_loss += loss.items()
     16 
     17         nb_tr_steps += 1

AttributeError: 'str' object has no attribute 'items'

the only change I did is removed .to_device coz It was giving error
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument index in method wrapper_index_select)

@nielsr Can you take a look at above error?

Hey, could you share a reproducible notebook on Colab or Kaggle?

1 Like

I am just following the notebook which is this:- Transformers-Tutorials/Custom_Named_Entity_Recognition_with_BERT.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub

and I only done some small changes which I listed above!

Try changing from:

loss, tr_logits = model(input_ids=ids, attention_mask=mask, labels=targets)

to:

outputs = model(input_ids=ids, attention_mask=mask, labels=targets)
loss = outputs["loss"]
tr_logits = outputs["logits"]

May be this can help, Bert ner classifier

Hey,

def train(epoch):

    tr_loss, tr_accuracy = 0, 0

    nb_tr_examples, nb_tr_steps = 0, 0

    tr_preds, tr_labels = [], []

    # put model in training mode

    model.train()

    

    for idx, batch in enumerate(training_loader):

        

        ids = batch['ids']

        mask = batch['mask']

        targets = batch['targets']

        outputs = model(input_ids=ids, attention_mask=mask, labels=targets)

        loss = outputs["loss"]

        tr_logits = outputs["logits"]

        tr_loss += loss.items()

        nb_tr_steps += 1

        nb_tr_examples += targets.size(0)

        

        if idx % 100==0:

            loss_step = tr_loss/nb_tr_steps

            print(f"Training loss per 100 training steps: {loss_step}")

           

        # compute training accuracy

        flattened_targets = targets.view(-1) # shape (batch_size * seq_len,)

        active_logits = tr_logits.view(-1, model.num_labels) # shape (batch_size * seq_len, num_labels)

        flattened_predictions = torch.argmax(active_logits, axis=1) # shape (batch_size * seq_len,)

        # now, use mask to determine where we should compare predictions with targets (includes [CLS] and [SEP] token predictions)

        active_accuracy = mask.view(-1) == 1 # active accuracy is also of shape (batch_size * seq_len,)

        targets = torch.masked_select(flattened_targets, active_accuracy)

        predictions = torch.masked_select(flattened_predictions, active_accuracy)

        

        tr_preds.extend(predictions)

        tr_labels.extend(targets)

        

        tmp_tr_accuracy = accuracy_score(targets.cpu().numpy(), predictions.cpu().numpy())

        tr_accuracy += tmp_tr_accuracy

    

        # gradient clipping

        torch.nn.utils.clip_grad_norm_(

            parameters=model.parameters(), max_norm=MAX_GRAD_NORM

        )

        

        # backward pass

        optimizer.zero_grad()

        loss.backward()

        optimizer.step()

    epoch_loss = tr_loss / nb_tr_steps

    tr_accuracy = tr_accuracy / nb_tr_steps

    print(f"Training loss epoch: {epoch_loss}")

    print(f"Training accuracy epoch: {tr_accuracy}")
for epoch in range(EPOCHS):

    print(f"Training epoch: {epoch + 1}")

    train(epoch)

Error:-

     15         loss = outputs["loss"]
     16         tr_logits = outputs["logits"]
---> 17         tr_loss += loss.items()
     18 
     19         nb_tr_steps += 1

AttributeError: 'Tensor' object has no attribute 'items'

@Emanuel You can see the collab which I am running:- Google Colab

Change from loss.items() to loss.item()

Thanks, It worked now I trained the model and I saved the model as well, so how do I load the model and make prediction, I am completely new to hugging face, so how do I load it and make prediction. I have saved the tokenizer and model. @Emanuel

I think you can try loading with:

from transformers import AutoModel

model = AutoModel.from_pretrained('path/to/your/model')

A quick way to make predictions with your model / tokenizer is with the pipeline() function, e.g.

from transformers import pipeline

# Note: the model and tokenizer directories are usually the same
ner_tagger = pipeline("ner", model="path/to/your/model/dir", tokenizer="path/to/your/tokenizer/dir")

text = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""

entities = ner_tagger(text)

I am doing the same see what I am getting @lewtun

from transformers import pipeline

# Note: the model and tokenizer directories are usually the same
ner_tagger = pipeline("ner", model="E:\model\config.json", tokenizer="E:\model\vocab.txt")

text = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""

entities = ner_tagger(text)

ValueError: Could not load model E:\model\config.json with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForTokenClassification'>, <class 'transformers.models.auto.modeling_tf_auto.TFAutoModelForTokenClassification'>, <class 'transformers.models.bert.modeling_bert.BertForTokenClassification'>, <class 'transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification'>).

Hey @ayush488, the model and tokenizer arguments should point to the directory where you saved the model / tokenizer with the save_pretrained() method. In other words, do the following work?

from transformers import pipeline

# Note: the model and tokenizer directories are usually the same
ner_tagger = pipeline("ner", model="E:\model", tokenizer="E:\model")

text = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""

entities = ner_tagger(text)

I did the same but got error like 1. ValueError: unable to parse E:\model\model\config.json as a URL or as a local path

Hmm looking at the error suggests that the pipeline is looking for a nested directory like model\model. Do you have all the model files in a subdirectory?