Help about Whisper chunk_length

Hello,

I need your help for clarifying why Whisper chunked algorithm should be used when:

  1. Transcription speed is the most important factor
  2. You are transcribing a single long audio file.

I try to use it with different batches and different audio in parallel and it works fine but i have remarked that inference time is huge compared with given audios one by one. I am using A100 with 80 GO, 10 CPU and 60 Go of memory.

Any explanation please ?

1 Like

Whisper is excellent, but it’s also stubborn. It’s difficult to make it faster or parallelize it… but it seems to be possible.

Whisper chunking

Whisper optimizing