BERT Model predicting 'PAD' for NER

Hi everyone! I have a quick question regarding BERT predictions for NER. As a bit of a background, I’m using BertforTokenClassification on large bodies of text (which have required chunking a large text into BERT sizeable inputs). I used much from here (Named entity recognition with Bert) as inspiration on my workflow for the project. One thing I don’t understand is the use of the final model for predictions - looking at the output, I’m seeing a decent amount of ‘PAD’ tokens predicted. I understand that in creating and training the model I provided attention masks so that this wouldn’t be trained on but I’m unsure now how to appropriately filter the output (or if that’s necessary) so ‘PAD’ isn’t predicted. I’m unclear how to deal with this. Some of the code in using the final model for predictions is below and a sample of some of the output is shown. Just FYI, the combined logits are done to combine predictions across the chunked input.

        if torch.cuda.is_available():
            input_ids = torch.tensor([tokenized_test_text]).cuda()
            input_ids = torch.tensor([tokenized_test_text])
        with torch.no_grad():
            output = model(input_ids)
        if torch.cuda.is_available():
            label_indices = np.argmax(output[0].to('cpu').numpy(), axis=2)
            logits = np.max(output[0].detach().cpu().numpy(), axis = 2)
            label_indices = np.argmax(output[0].numpy(), axis=2)
            logits = np.max(output[0].detach().numpy(), axis = 2)

        final_labels = [tag_values[label_idx] for label_idx in label_indices[0]]

[‘B-tumor_type’, ‘I-spec_id’, ‘B-n_lymph_nodes_examined_fraction’, ‘0’, ‘0’, ‘B-n_lymph_nodes_examined_fraction’, ‘PAD’, ‘0’, ‘0’, ‘B-n_lymph_nodes_examined_fraction’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘0’, ‘B-n_lymph_nodes_involved’, ‘0’, ‘0’]