Batching in "automatic-speech-recognition" pipelines

I am using the following code to send a batch of inputs to the automatic-speech-recognition pipeline:

from transformers import pipeline
from datasets import load_dataset
import numpy as np

ds = load_dataset(

input_data = ds[0]["audio"]["array"]
batch_test = np.vstack((input_data, input_data)) 
for i in range(5):
       batch_test = np.vstack((batch_test, input_data))

task = "automatic-speech-recognition"
model_name = 'facebook/s2t-small-librispeech-asr'
batch_size = 5

model  = pipeline(

res = model(batch_test)

However I am recieving the following error which seems that the huggingface model is not able to accepting stacked audio inputs and treats them as multi-channels outputs:

ValueError: We expect a single channel audio input for AutomaticSpeechRecognitionPipeline

Looking at the huggingface code it seems that following line is returning the mentioned error. I couldn’t find anything related to preprocessing batched intput in the code, how I can I enable batching for input to the huggingface models.