Problems with automatic virus scanning using ClamAV

iambestfeed · July 12, 2025, 1:49pm

My file is a subset split from the file

However, I wonder if my file is just a parquet file split from a self-created parquet file by datasets, what’s wrong with that file? And is there a way to rescan and turn off this flag?

About script:

# Split and upload
        for i in range(num_chunks):
            start_idx = i * chunk_size
            end_idx = min((i + 1) * chunk_size, num_rows)
            chunk_df = df.iloc[start_idx:end_idx]
            
            # Tạo tên file: original_name_k.parquet
            output_filename = f"{base_name}_{i}.parquet"
            temp_file = f"temp_{file_idx}_{i}.parquet"
            
            try:
                chunk_df.to_parquet(temp_file, index=False)
                
                # Upload giữ nguyên cấu trúc thư mục
                target_path = f"data/vie_Latn/train/{output_filename}"
                
                print(f"   ⬆️  Uploading {output_filename} ({len(chunk_df):,} samples)...")
                
                api.upload_file(
                    path_or_fileobj=temp_file,
                    path_in_repo=target_path,
                    repo_id=target_repo,
                    repo_type="dataset",
                    commit_message=f"Add {output_filename}"
                )
                print(f"   ✅ Uploaded {output_filename}")
                
            except Exception as e:
                print(f"   ❌ Failed to process chunk {i}: {e}")

John6666 · July 12, 2025, 1:51pm

@meganariley ClamAV issue.

Topic		Replies	Views
One parquet file of my dataset was marked unsafe 🤗Datasets	1	94	October 24, 2024
HF virus reporting false positives? Site Feedback	2	113	September 3, 2024
One of my datasets was marked unsafe 🤗Datasets	6	2421	March 16, 2023
Create the refs/convert/parquet branch of a script-based dataset to get the viewer 🤗Hub	9	149	October 14, 2024
Uploading Dataset: GUI vs Python "Error" 🤗Datasets	4	444	February 15, 2023

Problems with automatic virus scanning using ClamAV

Related topics