Whisper: Summarization Task or ASR + Summarization Trained End to End

datamodel · July 9, 2023, 8:37pm

Hi All,

I have some data where I have audio and human annotated summaries of the audio, I do not however, have ground truth transcripts. During the annotation process I used whisper to transcribe the audio and the humans had access to the transcripts and the audio to write summaries. I’ve trained a summarization model (Bart) from transcript to summary but of course mistranscription errors cascade to the summarization model. Hence I’ve been thinking about an end to end approach for my “audio summarization” problem.

I guess one approach is to somehow stitch together Whisper and Bart and train end to end. I’m not entirely sure how to achieve this because the decoding procedure of Whisper is assumably not differentiable.

Whisper(Audio) → Whisper Decoding → Transcript → Bart → Predicted Summary → Loss(Predicted Summary, True Summary)

So I guess the loss won’t be able to flow all the way back to the Whisper model? Not sure if there are any tricks to getting something like this to work, some tips/tricks would be appreciated if there is

Teaching Whisper a new task of Summarization
Whisper right now is aware of two tasks “translate” and “transcribe”. I was thinking why not just try to teach it a new task of “summarize”. As far as I understand Whisper’s decoder has just been pretrained with <|translate|> and <|transcribe|> prefixes and I would just finetune with <|summarize|>. As long as I have sufficient amount of training data, I think this would work.

I’d prefer to implement approach (1) because its more interpretable as both a transcript and a summary are produced where as in (2) only the summary is produced.

I’m curious to hear if these approaches are viable and any tips/tricks to make (1) work if possible

DivyaMereddy007 · December 19, 2023, 6:53pm

I am also looking for a similar solution. If you ended up developing a whisper summarize artictecture, can you share the details

Topic		Replies	Views
What's the state of the art with spoken transcript editing? Beginners	5	610	June 8, 2023
Transformer for Abstractive Summarization for Chats Based on Performance Research	3	1952	October 9, 2020
Model for summarizing lectures (or transcripts) Beginners	2	183	November 29, 2024
Don't know where to start. Please help manipulating transcribed audio Beginners	0	203	March 11, 2024
Summarizing radio show transcripts - where to begin? Beginners	0	162	June 25, 2023

Whisper: Summarization Task or ASR + Summarization Trained End to End

Related topics