How to run text to speech from inference endpoint given audio file url?


I’ve provisioned a private inference endpoint for whisper and would like to retrieve audio files from a url and transcribe them. I’ve been mostly following this example: Managed Transcription with OpenAI Whisper and Hugging Face Inference Endpoints

Here’s my code sample:

    const result = await fetch(audioURL);
    const blob= await result.blob();
    console.log('result is', blob);

    const secret = <My key> 
    const response = await fetch('<My endpoint>', { 
      method: 'POST',
      headers: { 
        Authorization: `Bearer ${secret.api_key}`,
        'Content-Type': blob.type,
      body: JSON.stringify({inputs: blob}),
    console.log('result is', result);
    return await response.json();

the audio url is a m4a file.

When i run this example I get the result: “Error: Malformed soundfile”.

I’ve looked up the expected parameters for the automatic speech recognition task here: Supported Transformers &amp; Diffusers Tasks

but the input doesn’t really give me any clues as to what kind of file is expected here.

Any advice? What is kind of file is the inference endpoint expecting?

did you solve this issue