Trojan in common_voice dataset?

Hi @lhoestq ,

I have also noticed that in other datasets such as the one used in the Code Parrot training blog there are instances of files being scanned as unsafe - trojans, malware, spyware, etc. It looks as if they are all non-executable since they are stored in JSON format in the arrow files. For example, this file is marked as unsafe - Unsafe: Win.Trojan.MSShellcode-88.

Thank you,

Enrico