I’ve been looking at this, and it seems like there really isn’t an easy way.
A lot of the scriptless magic is happening in HubDatasetModuleFactoryWithoutScript and LocalDatasetModuleFactoryWithoutScript. But because this happens outside of the builder, it’s difficult to incorporate into a loading script. As an example, let’s say I want to enumerate data files that I have in my dataset repository. I could use glob for this, which would work locally. Or I could use a function to list the files in the repo, which would work on the hub. This is pretty awkward, and there’s a big gap between the fully automatic no loading script and even tweaking it.
An exacerbating factor is that most datasets with a loading script don’t host the data on huggingface. It took me a while to find a simple example of a dataset that had both: sts17-crosslingual-sts.py · mteb/sts17-crosslingual-sts at main Linking to this or a similar simple script somewhere in the documentation would be helpful.