Hi,
I have an audio+transcripts corpus consisting of long audio files. Each audio file has its own metadata file where the time code of each segment is defined along with its transcript.
Is there a way to generate a HF dataset from this structure without having to split the original audio files to single utterance audio files ?
Datasets like AMI (edinburghcstr/ami Ā· Datasets at Hugging Face) and others do have a ābegin_timeā and āend_timeā column, but it doesnāt look that those fields are being used in the dataset script eitherā¦
Thanks