.wav
/home/user/myDataset
/home/user/myDF.csv
How can I use huggingface datasets to load and split and train model with my dataset above ?
datasets
You can load and split the dataset as follows:
from datasets import Dataset, Audio ds = Dataset.from_pandas(df) ds = ds.cast_column("file", Audio()) ds.rename_column("text", "audio") ds_dict_with_splits = ds.train_test_split(test_size=0.3)
I assume you want to train a speech recognition model - you can find a guide here and the transformers example scripts here (replace their dataset initialization code with your dataset)
transformers