How to run text to speech from inference endpoint given audio file url?

MR1 · December 31, 2022, 12:03am

Hello!

I’ve provisioned a private inference endpoint for whisper and would like to retrieve audio files from a url and transcribe them. I’ve been mostly following this example: Managed Transcription with OpenAI Whisper and Hugging Face Inference Endpoints

Here’s my code sample:

    const result = await fetch(audioURL);
    const blob= await result.blob();
    console.log('result is', blob);

    const secret = <My key> 
    const response = await fetch('<My endpoint>', { 
      method: 'POST',
      headers: { 
        Authorization: `Bearer ${secret.api_key}`,
        'Content-Type': blob.type,
      },
      body: JSON.stringify({inputs: blob}),
    });
    console.log('result is', result);
    return await response.json();

the audio url is a m4a file.

When i run this example I get the result: “Error: Malformed soundfile”.

I’ve looked up the expected parameters for the automatic speech recognition task here: Supported Transformers & Diffusers Tasks

but the input doesn’t really give me any clues as to what kind of file is expected here.

Any advice? What is kind of file is the inference endpoint expecting?

Delwin · June 8, 2023, 5:56am

did you solve this issue

Topic		Replies	Views
Duration of audio sequence ingested by Whisper Inference Endpoints on the Hub	2	1674	January 17, 2023
ASR on inference endpoints Intermediate	1	380	February 11, 2024
Realtime speech-to-text solution? Beginners	1	999	July 24, 2024
How to use Inference API to perform speech recognition Beginners	1	209	October 12, 2024
To create "Inference Endpoints" Beginners	0	120	January 15, 2024

How to run text to speech from inference endpoint given audio file url?

Related topics