Fine-tuning BERT for vulnerability detection with data sharing the same label

Jean-02 · May 17, 2024, 4:03pm

Hi,

I want to fine-tune BERT for vulnerability detection.

I’ve found several datasets on the subject, including this one available on HuggingFace: ‘CyberNative/Code_Vulnerability_Security_DPO’.
The dataset is organized into pairs of vulnerable and fixed code snippets, accompanied by a task description that serves as a question.

However, in this dataset, as well as in many datasets on the subject, there are only examples of vulnerable code. Therefore, all the data have the same label: vulnerable.
Is it still possible to fine-tune BERT with this dataset as it is?

And if so, when it comes time to test the model’s performance, how can it be evaluated since all the examples are vulnerable here?

Thanks in advance!

Topic		Replies	Views
Fine-tuning BERT with deterministic masking instead of random masking Beginners	0	161	April 22, 2024
Tokenizing Issue Fine Tuning BERT Beginners	0	184	March 25, 2023
BERT Once-Class Fine-Tuning Beginners	0	282	July 25, 2022
Can I fine tune bert for a project where I have multiple text inputs and one label as output? Beginners	0	797	May 6, 2022
BERT Multilabel - Different Training Dataset For Each Label? Intermediate	3	1305	December 27, 2021

Fine-tuning BERT for vulnerability detection with data sharing the same label

Related topics