The dataset viewer only displays the videos and does not show other fields?

I created a Parquet file locally with the following content:

    video_id     label      description                    video_path
0  00019.mp4   neutral         It's me.  test_hf_data/video/00019.mp4
1  00020.mp4  surprise     I remember it!  test_hf_data/video/00020.mp4
2  00021.mp4     anger  I want to go home.  test_hf_data/video/00021.mp4
3  00022.mp4      fear       I may die.  test_hf_data/video/00022.mp4
4  00024.mp4     happy   I am beautiful!  test_hf_data/video/00024.mp4

However, after uploading it to Hugging Face, the dataset viewer only displays the videos and does not show the label, description, video_id, or other fields. Why is this happening?

ZebangCheng/test_hf_data Ā· Datasets at Hugging Face

1 Like

When I looked at the repository, it seems that it is not in the Hugging Face datasets library format. I think that is the cause.

If you somehow load it in the datasets library and save it, it will be saved as a datasets library-style parquet automatically.

1 Like

Hi ! You should use a metadata file named ā€œmetadata.csvā€ (or .csv .parquet) with a file_name field and it will work :slight_smile:

(Same as for image or audio datasets)

Iā€™ll update the docs soon

2 Likes

Thank you for your reply.

I used a metadata.csv file with the following format:

file_name,label,description  
00019.mp4,neutral,It's me.  
00020.mp4,surprise,I remember it!  
00021.mp4,anger,I want to go home.  
00022.mp4,fear,I may die.  
00024.mp4,happy,I am beautiful!  

Then, I uploaded the dataset to Hugging Face using the following code:

from datasets import load_dataset  
import os  

dataset = load_dataset('csv', data_files={'train': 'test_hf_data_3/metadata.csv'})  
dataset = dataset.map(lambda x: {"video_path": x['file_name']})  

dataset.push_to_hub("ZebangCheng/test_hf_data_3")  

In the end, the uploaded data looks like this, and both label and description are displayed correctly:

ZebangCheng/test_hf_data_3 Ā· Datasets at Hugging Face

However, the video is not displayed properly. I would like to use the Dataset Viewer to display both the video and other fields simultaneously. But this seems to be conflicting ā€” when the video is displayed properly, the other fields (label and description) do not show, and when the other fields display correctly, the video doesnā€™t appear.

I look forward to the updated documentation, as it would help me better understand how to handle this.

1 Like

You should upload your folder of [metadata.csv + videos] as is, I think push_to_hub doesnā€™t support video types well at the moment.

e.g. using HfApi().upload_folder(ā€¦)

1 Like

Thank you for your guidance.

I have found some open-source datasets and will follow their format to upload and display video data. If successful, I may write some blog posts to document the process and help others.

Also, if the ā€œdocumentationā€ you mentioned earlier is ready, please feel free to @ mention me.

Thanks again!

1 Like

The docs are ready !

2 Likes

Thank you for your reminder. I have successfully resolved this issue.

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.