Highlighting important tokens for input into LLM

jeffyjeff2893 · November 18, 2023, 7:58pm

I’m currently trying to create a classifier that classifies code as vulnerable or non vulnerable (0 or 1) but I want the model to align to the tokens that it should consider important. I have a dataset of source code functions where vulnerable functions are collected from patches where the function before the patch is vulnerable, I want the model to consider the tokens changed in the patch as more important, but I don’t want the model to overfit to the presence of these special tokens. I’m using codebert and I want to prepend the important tokens before the actual function like this: [CLS] important tokens [SEP] function [SEP] My problem is that the important tokens would only show up in vulnerable functions so I would think the model would overfit to the presence of the tokens, is there any way I can get around this?

Topic		Replies	Views
Add Custom Token-Level Features 🤗Transformers	0	302	April 8, 2022
Apply BertForTokenClassification on partially labeled input 🤗Transformers	0	264	November 16, 2021
Special tokens and inference Intermediate	0	335	November 16, 2020
How to include token-level prior into bert? 🤗Transformers	0	256	April 29, 2022
Do I need token_type_ids for BertForSequenceClassification? 🤗Transformers	2	217	October 12, 2020

Highlighting important tokens for input into LLM

Related topics