How to use additional input features for NER?

umarghaffar · September 6, 2022, 12:44pm

OK guys, Let me explain you my issue very clearly. I have two arrays lists. One is list input sentence and other is a list of categories belong to respective sentence.

comment = ["I hate the food","Service is bad","Service is bad but food is good","[![Food in delicious][1]][1]"]
customer_name = ["umar","adil","cameron","Daniel"]

Now its mapping is like this

This is just a sample dataset. I have thousands of records in my dataset.

Actually I am doing sentiment analysis using BERT Model. But now I want to add customer_name in as an additional feature in my BERT Model

Right now this is how I am tokenising my input sentence and generating input_ids and attention_masks

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
input_ids = []
attention_masks = []
for sentence in df['Comment'].tolist():
    dictionary = tokenizer.encode_plus(
                        sentence,                      
                        add_special_tokens = True,
                        max_length = 512,
                        pad_to_max_length = True,
                        return_attention_mask = True,
                        return_tensors = 'pt',
                   )
    # encode_plus returns a dictionary 
    input_ids.append(dictionary['input_ids'])
    attention_masks.append(dictionary['attention_mask'])

You can clearly see that I am passing the comment as an input and this is function is return input_ids and attention_masks of each comment and I will create the input_ids layer and attention masks layer and feed it into my BERT Model. But now I want to customer name as an additional feature in my BERT Model.

I asked few experts before and they told me that traditional way is to hold an embedding for each category and concatenate (or any other method to combine features) it with the contextualized output of BERT before you feed it to the classification layer.

I searched about it on internet but this is what I found

# batch with two sentences (i.e. the citation text you have already used) 
i = t(["paper title 1", "paper title 2"], padding=True, return_tensors="pt")

# We assume that the first sentence (i.e. paper title 1) belongs to category 23 and the second sentence to category 42
# You probably want to use a dictionary in your own code 
i["categorical_feature_ids"] = torch.tensor([23,42])

Because I am not an expert in ML, so I have big confusion here

Where we will give our additional feature(Customer Name) as an input. For example, we can see that we are giving our input(Comment) in tokenize method of Bert tokenizer. But what about other features(Customer Name). It’s ok that we can created extra embedding layer for each feature but how will we inject those features (as an input) in our model.

Above piece of code is not enough for me to understand.

Topic		Replies	Views
How to concatenate additional features to the last layer of Bert 🤗Transformers	6	2379	January 14, 2024
Adding new features to Bert for NER 🤗Transformers	1	1200	May 7, 2022
Add per-word embedding from outer source to Bert embedding layer Beginners	0	303	January 14, 2024
Combine BertForSequenceClassificaion with Additional Features 🤗Transformers	3	9458	March 23, 2022
Adding additional features to BERT model Models	0	1047	July 18, 2022

How to use additional input features for NER?

Related topics