I have been searching all over the internet, including the official documentation of whisper, but i cant find a way to disable timestamps on whisper transcripts. Im using a colab.land project with the following line:
!whisper {input_path} --model large-v2 --language English --output_dir {output_folder} --output_format vtt
Can you help me on this? I’m not a developer myself, so I might have miss something. I have seen other hugging face projects where you can actually choose activate or deactivate timestamps for the output.
The Whisper pipeline (return_timestamps param) does not have an option to remove timestamps.
But if you use the generate method, you can disable the timestamps with return_timestamps param. This too only works if your clips are <30secs, since return_timestamps turns to True if it encounters long-form clips as evident from there generate method code here:
def _set_return_timestamps(return_timestamps, is_shortform, generation_config):
if not is_shortform:
if return_timestamps is False:
raise ValueError(
"You have passed more than 3000 mel input features (> 30 seconds) which automatically enables long-form generation which "
"requires the model to predict timestamp tokens. Please either pass `return_timestamps=True` or make sure to pass no more than 3000 mel input features."
)
logger.info("Setting `return_timestamps=True` for long-form generation.")
return_timestamps = True
My motive was also to disable timestamps, but in hopes to get less halucinations.
The reason it cannot be disabled for >30 sec clips is because a segment’s decoding depends on the timestamp predicted from its previous segment. Check section 4.5 of Whisper paper: