Upload efficiently for lazy split download

Yes. In parquet (default) or in WebDataset.

Ok thanks, I’ll eventually lean towards this.


Regarding the names, I know already that “calibration”, but following the tutorial for manual configuration with (metadata from my README.md)

configs:
  - config_name: default
    data_files:
      - split: train
        path: train/*/*.png
      - split: calibration
        path: calibration/*/*.png
      - split: test
        path: test/*/*.png

I made it work now!

I think I’ll eventually settle for this, and use the filters option to leave_out specific classes on-the-fly. I cannot find the proper documentation for filters format though. I you have a pointer, that’d be lovely!

Again, thank you very much for your help!

All the best.


I edited the original message as I made a typo in the manual config paths previously.

Second edit, I still had a typo, now it seems to work!

1 Like