One parquet file of my dataset was marked unsafe

Hi, three parquet files in my dataset OmniCorpus-CC were marked unsafe.

ClamAV: Unsafe
The following viruses have been found: Win.Trojan.Javel-1, Win.Trojan.Javel-1

I tried using ClamAV to scan and got:

% clamscan Desktop/train-00005-of-00053.parquet --verbose
Loading:     7s, ETA:   0s [========================>]    8.70M/8.70M sigs       
Compiling:   1s, ETA:   0s [========================>]       41/41 tasks 

Scanning /Users/lqy/Desktop/train-00005-of-00053.parquet
/Users/lqy/Desktop/train-00005-of-00053.parquet: OK

----------- SCAN SUMMARY -----------
Known viruses: 8699167
Engine version: 1.4.1
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 0.00 MB
Data read: 469.64 MB (ratio 0.00:1)
Time: 8.486 sec (0 m 8 s)
Start Date: 2024:10:24 16:38:54
End Date:   2024:10:24 16:39:02

I also tried using virustotal to scan and no unsafe report was warned.

May they be three false positive cases like this discussion?

How can i resolve this unsafe tag? Or can you provide the detailed description of the unsafe content?

1 Like

Dear @mcpotato:
Could you please help me re-scan the unsafe files and retag the safety?

It seems that the Top Replies of @mcpotato contains many discussions which meet the same issue.

jsonl format also raise

please, thank you very much

1 Like