elsaEU  
                
               
                 
                 
              
                  
                    December 28, 2023, 10:58am
                   
                   
              1 
               
             
            
              Hi,
I want to add more data to my dataset ELSA_D3:
Note that the filenames are:
train-0-05239
...
train-05239-05239
 
Can I add more parquet without re-upload all the files and automatically correct the readme metadata?
My currently upload procedure is:
Convert images to arrow files and store them on disk in N split 
Load in memory the N splits using  datasets.concatenate_datasets() 
Push using datasets.push_to_hub() 
 
Now I would like to concatenate another split and upload it without losing the previous data and without messing up with filenames
Thanks
             
            
               
               
               
            
            
           
          
            
            
              You can push_to_hub to a different split, and then manually modify the YAML in the README.md header to group the data_files together in the same split.
For example:
After pushing a new split train_part2 you ill get:
configs:
- config_name: default
  data_files:
  - split: train
    path: default/train-*
  - split: train_part2
    path: default/train_part2-*
 
and you can group the splits together this way:
configs:
- config_name: default
  data_files:
  - split: train
    path:
    - default/train-*
    - default/train_part2-*
 
You’d also have to update the datasets_info in the YAML to account for the new split size and number of examples (or just delete it)
             
            
               
               
               
            
            
           
          
            
              
                elsaEU  
                
               
              
                  
                    January 15, 2024,  1:40pm
                   
                   
              4 
               
             
            
              Thank you, it works fine.
             
            
               
               
               
            
            
           
          
            
              
                system  
                
                  Closed 
               
              
                  
                    January 16, 2024,  1:40am
                   
                   
              5 
               
             
            
              This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.