Any workaround for push_to_hub() limits?

Have the same problem. One commit per 50 files is still too frequent for me. Can’t customize shard size or number of shards due to the error. Using the datasets version ‘2.15.0’.

Most likely it is caused by around 1% of entires in the beginning of the dataset being much larger than the rest of the entries. As a result, in the beginning of pushing to the hub, the commits are relatively rare and the shard size is reasonably high (around 400-500 MB), for most of the entries the commits are too frequent and the shard size is extremely small (around 500 KB - 2 MB).

Apparently, the pusher somehow infers the optimal number of entries per shard in the beginning and doesn’t adjust it in the process, which is quite a hard assumption in my case.