I want to be able to overwrite a split in my dataset. Is there a way to do so?
When I push to an existing split I get this error:
ValueError: Split complexRoofLocation_01Apr2023_to_31May2023test already present
Is there a way to remove a split, without manually going into the dataset?
What’s strange is that datasets, despite the operation erroring out form the ValueError above, still overwrites the split:
Pushing dataset shards to the dataset hub: 100% [.....................] 1/1 [00:00<00:00, 55.04it/s]
This makes you feel like the whole operation failed, but in fact your dataset is now changed. That feels like a bug.
Additional Strange Behavior
While it updates the split, it doesn’t update the split’s information. Because of this when you pull down the dataset you may end up getting a
NonMatchingSplitsSizesError. I do because my the original split had 5 rows, but upon attempting to override there were only 4. So the dataset states there’s 5 but only 4 exist in the split.
This basically corrupts the data. Either it should let the overwrite happen or it shouldn’t do anything.
Appreciate you taking the time to read this!