Hi, I’m trying to follow this tutorial to fine-tune whisper
But the change I’m having here is that I’m using a data set that I created using script as detailed here
Now when I load my dataset and try to print it as follows:
from datasets import load_dataset, DatasetDict
common_voice = DatasetDict()
common_voice["train"] = load_dataset("user/dataset-name", split="train", use_auth_token=True, streaming=True)
common_voice["test"] = load_dataset("user/dataset-name", split="test", use_auth_token=True, streaming=True)
print(common_voice)
It returns something like this:
DatasetDict({
train: <datasets.iterable_dataset.IterableDataset object at 0x7f151f995760>
test: <datasets.iterable_dataset.IterableDataset object at 0x7f151f9a3d60>
})
If I’d like to get something as described in the tutorial:
DatasetDict({
train: Dataset({
features: [list_of_features],
num_rows: 6540
})
test: Dataset({
features: [list_of_features],
num_rows: 2894
})
})
What am I missing?