Batching in "automatic-speech-recognition" pipelines

I am using the following code to send a batch of inputs to the automatic-speech-recognition pipeline:

from transformers import pipeline
from datasets import load_dataset
import numpy as np

ds = load_dataset(
    "hf-internal-testing/librispeech_asr_demo",
    "clean",
    split="validation")

input_data = ds[0]["audio"]["array"]
batch_test = np.vstack((input_data, input_data)) 
for i in range(5):
       batch_test = np.vstack((batch_test, input_data))

task = "automatic-speech-recognition"
model_name = 'facebook/s2t-small-librispeech-asr'
batch_size = 5

model  = pipeline(
    task=task,
    model=model_name,
    batch_size=batch_size)

res = model(batch_test)
res

However I am recieving the following error which seems that the huggingface model is not able to accepting stacked audio inputs and treats them as multi-channels outputs:

ValueError: We expect a single channel audio input for AutomaticSpeechRecognitionPipeline

Looking at the huggingface code it seems that following line is returning the mentioned error. I couldn’t find anything related to preprocessing batched intput in the code, how I can I enable batching for input to the huggingface models.

I know this is an old post, but for future readers the solution to this problem is to pass a python list of numpy arrays (instead of a single batched numpy array) to the pipeline model:

model([waveform.cpu().numpy() for waveform in batched_waveforms])