When I load a folder structure containing multiple test sets (test1, test2) and a train set like the below, using ds = load_dataset("audiofolder", data_dir="/path/to/directory"), the resulting DatasetDict has a single test set that is the combined test1 and test2 sets.
May not be possible. There is only a set number of keywords that are automatically detected by the load_dataset method. It will merge partial names like test1 and test2 into a single test class.
if you only have 3 categories = consider renaming the folders as test, validate and train. Each is treated separately by dataloader. Otherwise fallback on something like
import os
path_to_dir='/path/to/directory'
load_dataset("audiofolder", data_paths=[{folder: f'{path_to_dir}/{folder}/metadata.csv'} for folder in os.listdir(path_to_dir)])
Parenthetically, it seems pretty straight forward to generalise the “detection” of subfolder splits to arbitrary names, does it not? And the Trainer gets told explicitly which sets are for training and evaluation (and supports multiple evaluation sets via a DatasetDict for evaluation), so that seems compatible with such a change.
Yes indeed. My guess is it was designed for specific purpose of only supporting the three “standard” splits as that may be how it’s used most frequently.