Dataset providing additional data for ASR pipelines


I create a pipeline for ASR with:

asr = get_offline_pipeline(‘automatic-speech-recognition’, local_dir)

And then use it for ASR with:

for inference in asr(inference_dataset, batch_size=100) …

For some of the inference data, I have the correct transcriptions, that I’d like to pass up to calculate WER. If I change inference_dataset to return a tuple (audio_data, transcription), the pipeline fails as it is not expecting a tuple. Is there any way to do this?