How to fine tune bert on entity recognition?

Suppose that I would like to label “Niels” as person, and that the original IOB annotation looked as follows:

Niels B-PER

When we tokenize “Niels” using BertTokenizer, we get:

from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

text = "Niels"
input_ids = tokenizer(text).input_ids
for id in input_ids: print(id, tokenizer.decode([id]))

This prints:

101 [CLS]
9152 ni
9050 ##els
102 [SEP]

As you can see, the word “Niels” has been tokenized into 2 tokens, namely “ni” and “##els”. The [CLS] and [SEP] tokens are special tokens which BERT uses by default - let’s ignore those for now. Suppose that the label index for B-PER is 1.

So now you have a choice: either you label both “ni” and “##els” with label index 1, either you only label the first subword token “ni” with 1 and the second one with -100. The latter assures that no loss will be taken into account for the second subword token.

Thanks, I understood, I am running your code on my colab when I run below code:-

def train(epoch):

    tr_loss, tr_accuracy = 0, 0

    nb_tr_examples, nb_tr_steps = 0, 0

    tr_preds, tr_labels = [], []

    # put model in training mode

    model.train()

    

    for idx, batch in enumerate(training_loader):

        

        ids = batch['ids']

        mask = batch['mask']

        targets = batch['targets']

        loss, tr_logits = model(input_ids=ids, attention_mask=mask, labels=targets)

        tr_loss += loss.items()

        nb_tr_steps += 1

        nb_tr_examples += targets.size(0)

        

        if idx % 100==0:

            loss_step = tr_loss/nb_tr_steps

            print(f"Training loss per 100 training steps: {loss_step}")

           

        # compute training accuracy

        flattened_targets = targets.view(-1) # shape (batch_size * seq_len,)

        active_logits = tr_logits.view(-1, model.num_labels) # shape (batch_size * seq_len, num_labels)

        flattened_predictions = torch.argmax(active_logits, axis=1) # shape (batch_size * seq_len,)

        # now, use mask to determine where we should compare predictions with targets (includes [CLS] and [SEP] token predictions)

        active_accuracy = mask.view(-1) == 1 # active accuracy is also of shape (batch_size * seq_len,)

        targets = torch.masked_select(flattened_targets, active_accuracy)

        predictions = torch.masked_select(flattened_predictions, active_accuracy)

        

        tr_preds.extend(predictions)

        tr_labels.extend(targets)

        

        tmp_tr_accuracy = accuracy_score(targets.cpu().numpy(), predictions.cpu().numpy())

        tr_accuracy += tmp_tr_accuracy

    

        # gradient clipping

        torch.nn.utils.clip_grad_norm_(

            parameters=model.parameters(), max_norm=MAX_GRAD_NORM

        )

       
        # backward pass

        optimizer.zero_grad()

        loss.backward()

        optimizer.step()

    epoch_loss = tr_loss / nb_tr_steps

    tr_accuracy = tr_accuracy / nb_tr_steps

    print(f"Training loss epoch: {epoch_loss}")

    print(f"Training accuracy epoch: {tr_accuracy}")

I am getting the below error,

     14         loss, tr_logits = model(input_ids=ids, attention_mask=mask, labels=targets)
---> 15         tr_loss += loss.items()
     16 
     17         nb_tr_steps += 1

AttributeError: 'str' object has no attribute 'items'

the only change I did is removed .to_device coz It was giving error
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument index in method wrapper_index_select)

@nielsr Can you take a look at above error?

Hey, could you share a reproducible notebook on Colab or Kaggle?

1 Like

I am just following the notebook which is this:- Transformers-Tutorials/Custom_Named_Entity_Recognition_with_BERT.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub

and I only done some small changes which I listed above!

Try changing from:

loss, tr_logits = model(input_ids=ids, attention_mask=mask, labels=targets)

to:

outputs = model(input_ids=ids, attention_mask=mask, labels=targets)
loss = outputs["loss"]
tr_logits = outputs["logits"]

May be this can help, Bert ner classifier

Hey,

def train(epoch):

    tr_loss, tr_accuracy = 0, 0

    nb_tr_examples, nb_tr_steps = 0, 0

    tr_preds, tr_labels = [], []

    # put model in training mode

    model.train()

    

    for idx, batch in enumerate(training_loader):

        

        ids = batch['ids']

        mask = batch['mask']

        targets = batch['targets']

        outputs = model(input_ids=ids, attention_mask=mask, labels=targets)

        loss = outputs["loss"]

        tr_logits = outputs["logits"]

        tr_loss += loss.items()

        nb_tr_steps += 1

        nb_tr_examples += targets.size(0)

        

        if idx % 100==0:

            loss_step = tr_loss/nb_tr_steps

            print(f"Training loss per 100 training steps: {loss_step}")

           

        # compute training accuracy

        flattened_targets = targets.view(-1) # shape (batch_size * seq_len,)

        active_logits = tr_logits.view(-1, model.num_labels) # shape (batch_size * seq_len, num_labels)

        flattened_predictions = torch.argmax(active_logits, axis=1) # shape (batch_size * seq_len,)

        # now, use mask to determine where we should compare predictions with targets (includes [CLS] and [SEP] token predictions)

        active_accuracy = mask.view(-1) == 1 # active accuracy is also of shape (batch_size * seq_len,)

        targets = torch.masked_select(flattened_targets, active_accuracy)

        predictions = torch.masked_select(flattened_predictions, active_accuracy)

        

        tr_preds.extend(predictions)

        tr_labels.extend(targets)

        

        tmp_tr_accuracy = accuracy_score(targets.cpu().numpy(), predictions.cpu().numpy())

        tr_accuracy += tmp_tr_accuracy

    

        # gradient clipping

        torch.nn.utils.clip_grad_norm_(

            parameters=model.parameters(), max_norm=MAX_GRAD_NORM

        )

        

        # backward pass

        optimizer.zero_grad()

        loss.backward()

        optimizer.step()

    epoch_loss = tr_loss / nb_tr_steps

    tr_accuracy = tr_accuracy / nb_tr_steps

    print(f"Training loss epoch: {epoch_loss}")

    print(f"Training accuracy epoch: {tr_accuracy}")
for epoch in range(EPOCHS):

    print(f"Training epoch: {epoch + 1}")

    train(epoch)

Error:-

     15         loss = outputs["loss"]
     16         tr_logits = outputs["logits"]
---> 17         tr_loss += loss.items()
     18 
     19         nb_tr_steps += 1

AttributeError: 'Tensor' object has no attribute 'items'

@Emanuel You can see the collab which I am running:- Google Colab

Change from loss.items() to loss.item()

Thanks, It worked now I trained the model and I saved the model as well, so how do I load the model and make prediction, I am completely new to hugging face, so how do I load it and make prediction. I have saved the tokenizer and model. @Emanuel

I think you can try loading with:

from transformers import AutoModel

model = AutoModel.from_pretrained('path/to/your/model')

A quick way to make predictions with your model / tokenizer is with the pipeline() function, e.g.

from transformers import pipeline

# Note: the model and tokenizer directories are usually the same
ner_tagger = pipeline("ner", model="path/to/your/model/dir", tokenizer="path/to/your/tokenizer/dir")

text = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""

entities = ner_tagger(text)

I am doing the same see what I am getting @lewtun

from transformers import pipeline

# Note: the model and tokenizer directories are usually the same
ner_tagger = pipeline("ner", model="E:\model\config.json", tokenizer="E:\model\vocab.txt")

text = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""

entities = ner_tagger(text)

ValueError: Could not load model E:\model\config.json with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForTokenClassification'>, <class 'transformers.models.auto.modeling_tf_auto.TFAutoModelForTokenClassification'>, <class 'transformers.models.bert.modeling_bert.BertForTokenClassification'>, <class 'transformers.models.bert.modeling_tf_bert.TFBertForTokenClassification'>).

Hey @ayush488, the model and tokenizer arguments should point to the directory where you saved the model / tokenizer with the save_pretrained() method. In other words, do the following work?

from transformers import pipeline

# Note: the model and tokenizer directories are usually the same
ner_tagger = pipeline("ner", model="E:\model", tokenizer="E:\model")

text = """Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO,
therefore very close to the Manhattan Bridge which is visible from the window."""

entities = ner_tagger(text)

I did the same but got error like 1. ValueError: unable to parse E:\model\model\config.json as a URL or as a local path

Hmm looking at the error suggests that the pipeline is looking for a nested directory like model\model. Do you have all the model files in a subdirectory?

Yes! I changed a little structureimage
and here is the code for the same,

from transformers import AutoTokenizer, AutoModel 
model = AutoModel.from_pretrained(r"E:\model\model") 
tokenizer = AutoTokenizer.from_pretrained(r"E:\model\tokenizer") 

I am getting an error which is ValueError : unable to parse E:\model\model\config.json as a URL or as a local path

Hey, I made it work! I think it was happening coz I trained the model on GPU’s on kaggle kernels, and downloaded it and making it work on CPU!

1 Like

Great that it finally worked!