I see. Then in this case you would be hard pressed to do without another library to manipulate tar.gz, but maybe you don’t want to add more dependencies…
If you’re going to do it with just the standard Python library, datasets, and soundfile, I do think it’s going to be hacky…
And you probably don’t want to have the means to get the dataset itself beforehand and put it in HF.