Disable timestamps for Whisper

Hi,

I have been searching all over the internet, including the official documentation of whisper, but i cant find a way to disable timestamps on whisper transcripts. Im using a colab.land project with the following line:

!whisper {input_path} --model large-v2 --language English --output_dir {output_folder} --output_format vtt

Can you help me on this? I’m not a developer myself, so I might have miss something. I have seen other hugging face projects where you can actually choose activate or deactivate timestamps for the output.

Thanks, Kind regards

Hi!

The Whisper pipeline (return_timestamps param) does not have an option to remove timestamps.

But if you use the generate method, you can disable the timestamps with return_timestamps param. This too only works if your clips are <30secs, since return_timestamps turns to True if it encounters long-form clips as evident from there generate method code here:

def _set_return_timestamps(return_timestamps, is_shortform, generation_config):
        if not is_shortform:
            if return_timestamps is False:
                raise ValueError(
                    "You have passed more than 3000 mel input features (> 30 seconds) which automatically enables long-form generation which "
                    "requires the model to predict timestamp tokens. Please either pass `return_timestamps=True` or make sure to pass no more than 3000 mel input features."
                )

            logger.info("Setting `return_timestamps=True` for long-form generation.")
            return_timestamps = True

My motive was also to disable timestamps, but in hopes to get less halucinations.
The reason it cannot be disabled for >30 sec clips is because a segment’s decoding depends on the timestamp predicted from its previous segment. Check section 4.5 of Whisper paper:
image

Hope it helps!

Regards,
Jay