Create custom tags for fine-tuning Bert for NER task

I’d like to fine-tune at BERT for NER for token classification task. Primary reason for using NER is that it would encode the entire doc using NER which would teach the model INCOME > EXPENSE section within the document. With the vanilla BERT, we look at each line item individually thus potentially losing out on context from other relevant text in the document.

I have a pandas dataframe with some texts and labels.

import pandas as pd
import numpy as np

# Set a seed for reproducibility
np.random.seed(42)

# Generate dummy data
data = {
    'Text': [
        "Paid utility bills",
        "Salary received from XYZ Corp",
        "Dinner expenses at ABC Restaurant",
        "Investment dividends",
        "Rent payment"
    ],
    'Label': ['Expense', 'Income', 'Expense', 'Income', 'Expense']
}

# Create a pandas DataFrame
df = pd.DataFrame(data)

How do I transform my dataset for BERT NER task? i.e how do I insert custom tags such that the data is in a similar format to some of the other datasets used for NER such as the one below.

I am expecting the data to be something like this:

Salary received from XYZ Corp [sep] Investment dividends [sep] Rent payment
Inc-B Inc-I Inc-I Inc-I 0 Inc-B Inc-I 0 Exp-B Exp-I

Salary received from XYZ Corp [ORG]

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("EvanD/dutch-ner-xlm-conll2003")
ner_model = AutoModelForTokenClassification.from_pretrained("EvanD/dutch-ner-xlm-conll2003")

nlp = pipeline("ner", model=ner_model, tokenizer=tokenizer, aggregation_strategy="simple")
example = "George Washington ging naar Washington"

ner_results = nlp(example)
print(ner_results)

# {
#     "start_pos": 0,
#     "end_pos": 17,
#     "text": "George Washington",
#     "score": 0.9999986886978149,
#     "label": "PER"
# }
# {
#     "start_pos": 28,
#     "end_pos": 38,
#     "text": "Washington",
#     "score": 0.9999939203262329,
#     "label": "LOC"
# }