Inference API error with Whisper, return_timestamps parameter

oslv · April 12, 2025, 2:08am

Hi team, been using the inference endpoint for whisper for months at https://api-inference.huggingface.co/models/openai/whisper-large-v3-turbo. Today, all of a sudden, the API started throwing this error

'You have passed more than 3000 mel input features (> 30 seconds) which automatically enables long-form generation which requires the model to predict timestamp tokens. Please either pass `return_timestamps=True` or make sure to pass no more than 3000 mel input features.', 'warnings': ['There was an inference error: You have passed more than 3000 mel input features (> 30 seconds) which automatically enables long-form generation which requires the model to predict timestamp tokens. Please either pass `return_timestamps=True` or make sure to pass no more than 3000 mel input features.

I tried passing the parameter in multiple ways

parameters = {
    "language": "en",
    "temperature": "0.0",
    "return_timestamps": "true",

also using 1, True (boolean), “True” But the error still comes up. I am also able to see the error on the UI playground at openai/whisper-large-v3-turbo · Hugging Face : Did anyone see similar behaviour ? Can the team please help on this ? Many thanks

John6666 · April 12, 2025, 6:29am

Hmm, I tried it with the wav file I found, and it seems to work for the moment.

mahmutc · April 12, 2025, 6:50am

The issue is related to file length. WAV files longer than 30 seconds trigger the error.

TOOTHED · April 13, 2025, 4:46pm

I also faced the same problem (whisper-large-v3-turbo, ogg audio). Until yesterday it was working fine, but now audio >30s causing error, return_timestamps does not help as well. Same issue on whisper-large-v3

oslv · April 13, 2025, 9:12pm

I agree with @TOOTHED - seeing same behaviour here. Files >30 s used to work without any issues but now throw an error, and no keyword/param works, should this be raised with the API team somehow ?

John6666 · April 14, 2025, 1:32am

I think we can contact the Hugging Face team from any of these. The Hugging Face functions are managed on github with the exception of a few, so raising an issue on github is the most reliable way to get in touch. However, there are so many github pages that it’s hard to know where to write…

Hugging Face Documents and Hub general issue

Library handling Inference API

Whisper Class (Maybe it’s different this time…)

TOOTHED · April 14, 2025, 3:47pm

Well, tbh it is literally my first comment on hf forums, so I am not sure how it is working here. But overall, yes, as whisper models were not updated and this behaviour is similar for all them, the error is for sure caused by the updates in hugging face API - either AutomaticSpeechRecognition method (i didn’t test if any parameters work), or something deeper

oslv · April 18, 2025, 2:45am

I have now cross-posted this in three different places - wondering if someone knows how to tag someone from the dev/support team. I feel like this is part of the broader inference api changes, the behaviour of this whisper endpoint has radically changed and is barely usable - the silence from the team on all open issues here is somewhat worrying.

John6666 · April 18, 2025, 3:03am

github.com/huggingface/hub-docs

Inference API error with Whisper, return_timestamps parameter

opened 03:11AM - 15 Apr 25 UTC

oscar-lv

**Bug description.** Hi team, been using the inference endpoint for whisper for …months at https://api-inference.huggingface.co/models/openai/whisper-large-v3-turbo. Today, all of a sudden, the API started throwing this error > You have passed more than 3000 mel input features (> 30 seconds) which automatically enables long-form generation which requires the model to predict timestamp tokens. Please either pass `return_timestamps=True` or make sure to pass no more than 3000 mel input features.', 'warnings': ['There was an inference error: You have passed more than 3000 mel input features (> 30 seconds) which automatically enables long-form generation which requires the model to predict timestamp tokens. Please either pass `return_timestamps=True` or make sure to pass no more than 3000 mel input features. This of course only happens when passing in samples longer than 30 seconds, and is replicable through the UI. Passing a `return_timestamp` parameter in the HTTP request does not solve the issue, either in boolean or string form (True/"true") `parameters = { "language": "en", "temperature": "0.0", "return_timestamps": True}` Using `generation_params` also fails here. **Describe the expected behaviour** The endpoint should run inference as it has previously. Samples exceeding 30 seconds were supported without any issues and no parameter had to be provided. ![Image](https://github.com/user-attachments/assets/efb8aa59-be1e-4c64-a240-8351ccd9ee76)

Hmm, I only know the staff who are visible…

If it’s related to Whisper, Serverless Inference API, or servers, then michellehbn, Wauplin, pierric, or victor?
You could also ask meganariley…
The last resort is julien-c.

oslv · April 18, 2025, 3:16am

Thanks @John6666 !

@michellehbn - tagging here as this might be related to the broader inference API issues

Haddad308 · April 20, 2025, 12:21pm

Is there are any replacement for this model ?

John6666 · April 20, 2025, 12:57pm

Maybe something like FastRTC. But the application may be a little different.

TOOTHED · April 25, 2025, 3:29am

Right now the problem was finally solved - it is working correctly on the HF website and with API request.

However another change appears in API: no more Auto-detection of audio format, content-type needs to be explitely stated. Briefly in my case: previously I could pass the “application/octet-stream” type (raw bytes) and HF would auto detect it as OGG. But now it causes error “Content type “None” not supported.” . So the solution is to explitely state that it is “audio/ogg” content-type.

I believe in raw API request it was always the case, but I used huggingface.js Inference Client which did not require explicit content-type before.

John6666 · April 25, 2025, 7:31am

I found a change that seems to be “it.” (I’m not sure if it really is, though…)

Topic		Replies	Views
Duration of audio sequence ingested by Whisper Inference Endpoints on the Hub	2	1684	January 17, 2023
Whisper warning about not predicting end of a timestamp 🤗Transformers	1	1537	June 20, 2025
Disable timestamps for Whisper Beginners	1	2685	May 26, 2024
Deploying Open AI's whisper on Sagemaker Amazon SageMaker	54	16187	April 12, 2024
Openai/whisper-large-v3: Payload reached size limit Beginners	1	388	February 10, 2025

Hugging Face Documents and Hub general issue

Library handling Inference API

Whisper Class (Maybe it’s different this time…)

Related topics