Tutorial: Fine-tuning with custom datasets – sentiment, NER, and question answering

akshay0710 · August 19, 2022, 12:03am

@joeddav Thank you for the tutorial. I was trying to replicate the finetuning code with a different dataset and it worked. But when I changed the pretrainedmodel from Distilbert to something else like Roberta or XlNet, I got an error in the encoding function.

This is the encoding function:

def encode_tags(tags, encodings):
    labels = [[tag2id[tag] for tag in doc] for doc in tags]
    encoded_labels = []
    for doc_labels, doc_offset in zip(labels, encodings.offset_mapping):
        # create an empty array of -100
        doc_enc_labels = np.ones(len(doc_offset),dtype=int) * -100
        arr_offset = np.array(doc_offset)
       # set labels whose first offset position is 0 and the second is not 0
        doc_enc_labels[(arr_offset[:,0] == 0) & (arr_offset[:,1] != 0)] = doc_labels
        encoded_labels.append(doc_enc_labels.tolist())

    return encoded_labels

It didn’t throw an error if I use BERT or DISTILBERT as the pretrained model and tokenizer, but if I use some other model in its place - This was the error that I got:

Traceback (most recent call last):
File “huggingFace_NER.py”, line 187, in
train_labels = encode_tags(train_tags, train_encodings)
File “huggingFace_NER.py”, line 70, in encode_tags
doc_enc_labels[(arr_offset[:,0] == 0) & (arr_offset[:,1] != 0)] = doc_labels
ValueError: NumPy boolean array indexing assignment cannot assign 100 input values to the 80 output values where the mask is true

Topic		Replies	Views
Chapter 3 questions Course	154	10946	December 7, 2025
Bert with Ner using python Beginners	0	161	November 2, 2023
Chapter 7 questions Course	121	10697	October 22, 2025
Doccano dataset for named entity recognition task using BERT Beginners	3	537	May 14, 2024
Overall accuracy in Finetuning dslim/bert-base-NER with custom dataset and labels gets only up to ~0.15 using seqeval 🤗Transformers	2	524	May 1, 2023

Tutorial: Fine-tuning with custom datasets – sentiment, NER, and question answering

Related topics