Transformer for Abstractive Summarization for Chats Based on Performance

Hi, I’ve some general questions related to Transfer Learning on pretrained models for summarization problem. I’ve been trying to engineer Seq2Seq model for Summarizing Chats between two user agents.
I’ve tried T5 model (Pretrained & Transfer Learning), but the results were not satisfactory. The summarized text missed the context entirely after training on the custom dataset.
Can someone please help me understand which model works better for summarizing chats or any pre-processing task that precedes this.
Thanks in advance.

Hi @anant0308 ! Happy to discuss possible approaches, but what works best (and whether you can expect good results at all) will depend on what your fine-tuning data looks like: for example, how long are the chats? do you have any gold summaries for your chats? do you have examples of summaries without corresponding chats? how many examples do you have? how are you representing speaker turns?

Keep in mind that summarizing chats is quite a different task from summarizing news text: if the pre-training data lacks any kind of dialogue inputs, then the model will have to learn how to interpret multi-turn structure from scratch, which will probably be your main challenge.


Hey @yjernite, the primary challenge as you mentioned is to identify the speaker and hence interpret the structure. The dataset is somewhat similar to (SAMsum corpus -
The following are the key points that might help -

  1. The summaries are there.
  2. The chats are similar to normal texts exchanged between two users.
  3. There are around 15K-20K training examples.
  4. Currently, the speaker is represented as is. (Based on Name)

Kindly suggest the improvements for better implementation of abstractive summarization. Following are my key queries -

  1. Is there any preferred model for chat summarization?
  2. What might be the pre-processing steps for improvement in performance?
  3. How should speakers be represented as it was found that the contexts might be changed because of a speaker name being present in a sentence (ambiguity increased) ?

Any suggestion would be of great help !

Did you ever find an improvement?

I am trying to accomplish the sam thing with the SAMsum dataset