Why our dataset have unsafe files?

We have uploaded a set of .gz files.

However, on the main page of our dataset (liwu/MNBVC · Datasets at Hugging Face), the following warning is shown:

This dataset has 267 files that have been marked as unsafe.

For each file, the following error can be seen:

Virus: Can’t write to file ERROR

I am wondering why? Why our dataset is unsafe?

These files are just compressed text files, in particular, compressed .jsonl files in UTF-8. They are absolutly SAFE.

1 Like

cc @mcpotato

any update?

Ah sorry about that, an error must have happened during scanning. I’ll check to see what happened and try to re-scan.

Re-scan was broken, it seems to have went through this time, only 27 files are marked as unsafe now, with “valid” flags.

These files are just texts in json format. How texts be “unsafe” :sweat_smile:

It depends on the matching rules of the antivirus, some text files can contain code or some string that is used to detect viruses.