Finding An Appropriate Dataset

AIdrive · August 17, 2023, 1:42am

Hi all,

I have recently started working on an AI app to detect C/C++ source code vulnerabilities. My understanding is that for the training and validation, I need to input (to the model) both safe and unsafe code examples. The problem is that I cannot find a dataset anywhere, that clearly delineates between the two — they all either contain nothing but unsafe code examples, or contain a single file (pkl or json) that contains both safe and unsafe together/merged.

I thought there may be some datasets that would have something like one directory (or file) that contains only safe, and another that contains only unsafe.

Any help here would be appreciated.

Thanks.

Topic		Replies	Views
Need Help Finding Appropriate Dataset(s) Beginners	0	137	August 16, 2023
About the datasets category 🤗Datasets	1	360	July 7, 2020
Trojan in common_voice dataset? 🤗Datasets	8	1175	June 30, 2022
Why our dataset have unsafe files? 🤗Datasets	6	807	September 25, 2023
Source Code Vulnerability Analysis GPT2 Research	1	462	July 23, 2023

Finding An Appropriate Dataset

Related topics