BERT Model predicting 'PAD' for NER

dmandair · November 11, 2021, 10:48pm

Hi everyone! I have a quick question regarding BERT predictions for NER. As a bit of a background, I’m using BertforTokenClassification on large bodies of text (which have required chunking a large text into BERT sizeable inputs). I used much from here (Named entity recognition with Bert) as inspiration on my workflow for the project. One thing I don’t understand is the use of the final model for predictions - looking at the output, I’m seeing a decent amount of ‘PAD’ tokens predicted. I understand that in creating and training the model I provided attention masks so that this wouldn’t be trained on but I’m unsure now how to appropriately filter the output (or if that’s necessary) so ‘PAD’ isn’t predicted. I’m unclear how to deal with this. Some of the code in using the final model for predictions is below and a sample of some of the output is shown. Just FYI, the combined logits are done to combine predictions across the chunked input.

        if torch.cuda.is_available():
            input_ids = torch.tensor([tokenized_test_text]).cuda()
        else:
            input_ids = torch.tensor([tokenized_test_text])
        with torch.no_grad():
            output = model(input_ids)
        if torch.cuda.is_available():
            label_indices = np.argmax(output[0].to('cpu').numpy(), axis=2)
            logits = np.max(output[0].detach().cpu().numpy(), axis = 2)
        else:
            label_indices = np.argmax(output[0].numpy(), axis=2)
            logits = np.max(output[0].detach().numpy(), axis = 2)

        combined_logits.extend(logits[0])
        final_labels = [tag_values[label_idx] for label_idx in label_indices[0]]
        combined_labels.extend(final_labels)

[‘B-tumor_type’, ‘I-spec_id’, ‘B-n_lymph_nodes_examined_fraction’, ‘0’, ‘0’, ‘B-n_lymph_nodes_examined_fraction’, ‘PAD’, ‘0’, ‘0’, ‘B-n_lymph_nodes_examined_fraction’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘B-n_lymph_nodes_involved’, ‘0’, ‘0’]

Topic		Replies	Views
BERT for NER output of only '0' Beginners	0	671	November 14, 2021
Decoding the predicted output array in distilbertbase uncased model for NER 🤗Transformers	1	7367	October 11, 2021
Padded sequences in language model (like BERT) with LSTM on top Beginners	0	360	September 9, 2022
Is the attention mask and tokenization taken into account? Beginners	0	349	December 7, 2021
Inconsistency in Model Output [ Token Classification] 🤗Transformers	0	333	April 12, 2023

BERT Model predicting 'PAD' for NER

Related topics