How to fine tune bert on entity recognition?

ayush488 · November 3, 2021, 7:09am

I have a paragraph for example below
Either party may terminate this Agreement by written notice at any time if the other party defaults in the performance of its material obligations hereunder. In the event of such default, the party declaring the default shall provide the defaulting party with written notice setting forth the nature of the default, and the defaulting party shall have thirty (30) days to cure the default. If after such 30 day period the default remains uncured, the aggrieved party may terminate this Agreement by written notice to the defaulting party, which notice shall be effective upon receipt.

and then I need the Entity label and Entity value

Entity value = thirty (30) days
Entity label = Termination Notice Period

and I want to frame it as a Entity recognition task, So could you please tell me how would you have been approached?

nielsr · November 3, 2021, 12:11pm

Named-entity recognition (NER) is typically solved as a sequence tagging task, i.e. the model is trained to predict a label for every word. Typically one annotates NER datasets using the IOB annotation format (or one of its variants, like BIOES). Let’s take the example sentence from your paragraph. It would have to be annotated as follows:

the O
defaulting O
party O
shall O
have O
thirty B-TER
(30) I-TER
days I-TER
to O
cure O 
the O
default O
. O

In other words, we annotate each word as being either outside a named entity (“O”), inside a named-entity (“I-TER”) or at the beginning of a named entity (“B-TER”).

However, there’s one additional challenge, in the sense that models like BERT operate on subword tokens, rather than words, meaning that a word like “hello” might be tokenized into [“hel”, “lo”]. This means that one should actually labels all tokens rather than all words, as BERT will be trained to predict a label for every token. There are multiple strategies here, one could either propagate the label to all subtokens of a word, or only label the first subword token of a given word.

You can take a look at my example notebooks that illustrate how to fine-tune BERT for NER.

ayush488 · November 8, 2021, 5:21am

I didn’t understand this, plz can you explain with an example as well about strategies.

nielsr · November 8, 2021, 10:07am

Suppose that I would like to label “Niels” as person, and that the original IOB annotation looked as follows:

Niels B-PER

When we tokenize “Niels” using BertTokenizer, we get:

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

text = "Niels"
input_ids = tokenizer(text).input_ids
for id in input_ids: print(id, tokenizer.decode([id]))

This prints:

101 [CLS]
9152 ni
9050 ##els
102 [SEP]

As you can see, the word “Niels” has been tokenized into 2 tokens, namely “ni” and “##els”. The [CLS] and [SEP] tokens are special tokens which BERT uses by default - let’s ignore those for now. Suppose that the label index for B-PER is 1.

So now you have a choice: either you label both “ni” and “##els” with label index 1, either you only label the first subword token “ni” with 1 and the second one with -100. The latter assures that no loss will be taken into account for the second subword token.

ayush488 · November 10, 2021, 1:50pm

Thanks, I understood, I am running your code on my colab when I run below code:-

def train(epoch):

    tr_loss, tr_accuracy = 0, 0

    nb_tr_examples, nb_tr_steps = 0, 0

    tr_preds, tr_labels = [], []

    # put model in training mode

    model.train()

    

    for idx, batch in enumerate(training_loader):

        

        ids = batch['ids']

        mask = batch['mask']

        targets = batch['targets']

        loss, tr_logits = model(input_ids=ids, attention_mask=mask, labels=targets)

        tr_loss += loss.items()

        nb_tr_steps += 1

        nb_tr_examples += targets.size(0)

        

        if idx % 100==0:

            loss_step = tr_loss/nb_tr_steps

            print(f"Training loss per 100 training steps: {loss_step}")

           

        # compute training accuracy

        flattened_targets = targets.view(-1) # shape (batch_size * seq_len,)

        active_logits = tr_logits.view(-1, model.num_labels) # shape (batch_size * seq_len, num_labels)

        flattened_predictions = torch.argmax(active_logits, axis=1) # shape (batch_size * seq_len,)

        # now, use mask to determine where we should compare predictions with targets (includes [CLS] and [SEP] token predictions)

        active_accuracy = mask.view(-1) == 1 # active accuracy is also of shape (batch_size * seq_len,)

        targets = torch.masked_select(flattened_targets, active_accuracy)

        predictions = torch.masked_select(flattened_predictions, active_accuracy)

        

        tr_preds.extend(predictions)

        tr_labels.extend(targets)

        

        tmp_tr_accuracy = accuracy_score(targets.cpu().numpy(), predictions.cpu().numpy())

        tr_accuracy += tmp_tr_accuracy

    

        # gradient clipping

        torch.nn.utils.clip_grad_norm_(

            parameters=model.parameters(), max_norm=MAX_GRAD_NORM

        )

       
        # backward pass

        optimizer.zero_grad()

        loss.backward()

        optimizer.step()

    epoch_loss = tr_loss / nb_tr_steps

    tr_accuracy = tr_accuracy / nb_tr_steps

    print(f"Training loss epoch: {epoch_loss}")

    print(f"Training accuracy epoch: {tr_accuracy}")

I am getting the below error,

     14         loss, tr_logits = model(input_ids=ids, attention_mask=mask, labels=targets)
---> 15         tr_loss += loss.items()
     16 
     17         nb_tr_steps += 1

AttributeError: 'str' object has no attribute 'items'

the only change I did is removed .to_device coz It was giving error
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument index in method wrapper_index_select)

ayush488 · November 11, 2021, 12:12pm

@nielsr Can you take a look at above error?

Emanuel · November 11, 2021, 1:22pm

Hey, could you share a reproducible notebook on Colab or Kaggle?

ayush488 · November 11, 2021, 1:54pm

I am just following the notebook which is this:- Transformers-Tutorials/Custom_Named_Entity_Recognition_with_BERT.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub

and I only done some small changes which I listed above!

Emanuel · November 11, 2021, 2:09pm

Try changing from:

loss, tr_logits = model(input_ids=ids, attention_mask=mask, labels=targets)

to:

outputs = model(input_ids=ids, attention_mask=mask, labels=targets)
loss = outputs["loss"]
tr_logits = outputs["logits"]

sbmaruf · November 11, 2021, 2:13pm

May be this can help, Bert ner classifier

ayush488 · November 11, 2021, 4:00pm

Hey,

def train(epoch):

    tr_loss, tr_accuracy = 0, 0

    nb_tr_examples, nb_tr_steps = 0, 0

    tr_preds, tr_labels = [], []

    # put model in training mode

    model.train()

    

    for idx, batch in enumerate(training_loader):

        

        ids = batch['ids']

        mask = batch['mask']

        targets = batch['targets']

        outputs = model(input_ids=ids, attention_mask=mask, labels=targets)

        loss = outputs["loss"]

        tr_logits = outputs["logits"]

        tr_loss += loss.items()

        nb_tr_steps += 1

        nb_tr_examples += targets.size(0)

        

        if idx % 100==0:

            loss_step = tr_loss/nb_tr_steps

            print(f"Training loss per 100 training steps: {loss_step}")

           

        # compute training accuracy

        flattened_targets = targets.view(-1) # shape (batch_size * seq_len,)

        active_logits = tr_logits.view(-1, model.num_labels) # shape (batch_size * seq_len, num_labels)

        flattened_predictions = torch.argmax(active_logits, axis=1) # shape (batch_size * seq_len,)

        # now, use mask to determine where we should compare predictions with targets (includes [CLS] and [SEP] token predictions)

        active_accuracy = mask.view(-1) == 1 # active accuracy is also of shape (batch_size * seq_len,)

        targets = torch.masked_select(flattened_targets, active_accuracy)

        predictions = torch.masked_select(flattened_predictions, active_accuracy)

        

        tr_preds.extend(predictions)

        tr_labels.extend(targets)

        

        tmp_tr_accuracy = accuracy_score(targets.cpu().numpy(), predictions.cpu().numpy())

        tr_accuracy += tmp_tr_accuracy

    

        # gradient clipping

        torch.nn.utils.clip_grad_norm_(

            parameters=model.parameters(), max_norm=MAX_GRAD_NORM

        )

        

        # backward pass

        optimizer.zero_grad()

        loss.backward()

        optimizer.step()

    epoch_loss = tr_loss / nb_tr_steps

    tr_accuracy = tr_accuracy / nb_tr_steps

    print(f"Training loss epoch: {epoch_loss}")

    print(f"Training accuracy epoch: {tr_accuracy}")

for epoch in range(EPOCHS):

    print(f"Training epoch: {epoch + 1}")

    train(epoch)

Error:-

     15         loss = outputs["loss"]
     16         tr_logits = outputs["logits"]
---> 17         tr_loss += loss.items()
     18 
     19         nb_tr_steps += 1

AttributeError: 'Tensor' object has no attribute 'items'

ayush488 · November 11, 2021, 4:27pm

@Emanuel You can see the collab which I am running:- Google Colab

Emanuel · November 11, 2021, 4:36pm

Change from loss.items() to loss.item()

ayush488 · November 14, 2021, 10:17am

Thanks, It worked now I trained the model and I saved the model as well, so how do I load the model and make prediction, I am completely new to hugging face, so how do I load it and make prediction. I have saved the tokenizer and model. @Emanuel

Emanuel · November 14, 2021, 7:58pm

I think you can try loading with:

from transformers import AutoModel

model = AutoModel.from_pretrained('path/to/your/model')

lewtun · November 14, 2021, 9:34pm

A quick way to make predictions with your model / tokenizer is with the pipeline() function, e.g.

from transformers import pipeline

# Note: the model and tokenizer directories are usually the same
ner_tagger = pipeline("ner", model="path/to/your/model/dir", tokenizer="path/to/your/tokenizer/dir")

text = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""

entities = ner_tagger(text)

ayush488 · November 15, 2021, 11:57am

I am doing the same see what I am getting @lewtun

from transformers import pipeline

# Note: the model and tokenizer directories are usually the same
ner_tagger = pipeline("ner", model="E:\model\config.json", tokenizer="E:\model\vocab.txt")

text = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""

entities = ner_tagger(text)

ValueError: Could not load model E:\model\config.json with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForTokenClassification'>, <class 'transformers.models.auto.modeling_tf_auto.TFAutoModelForTokenClassification'>, <class 'transformers.models.bert.modeling_bert.BertForTokenClassification'>, <class 'transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification'>).

lewtun · November 15, 2021, 1:48pm

Hey @ayush488, the model and tokenizer arguments should point to the directory where you saved the model / tokenizer with the save_pretrained() method. In other words, do the following work?

from transformers import pipeline

# Note: the model and tokenizer directories are usually the same
ner_tagger = pipeline("ner", model="E:\model", tokenizer="E:\model")

text = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""

entities = ner_tagger(text)

ayush488 · November 16, 2021, 9:12am

I did the same but got error like 1. ValueError: unable to parse E:\model\model\config.json as a URL or as a local path

lewtun · November 16, 2021, 9:46am

Hmm looking at the error suggests that the pipeline is looking for a nested directory like model\model. Do you have all the model files in a subdirectory?

Topic		Replies	Views
Named Entity Recognition: fine-tune or create new model? Beginners	3	3510	February 11, 2023
Annotate a NER dataset (for BERT) Beginners	3	1578	May 29, 2024
Initialising BERT Model Models	0	276	June 9, 2023
Improving NER BERT performing POS tagging Beginners	3	2145	November 23, 2020
Custom Entity Tagging Using BERT: How to Label Specific Terms? Beginners	0	345	October 14, 2023

How to fine tune bert on entity recognition?

Related topics