Clarification regarding summarization task using LLAMA or GPT2 (Medium,Small)

asharsha30 · November 23, 2024, 4:53pm

I am working with fine tuning for the purpose of text summarization purpose. Recent days, I have been experimenting with certain ways to finetune GPT2 for text summarization especially on the datasets like CNN-Daily Mail, XSUM etc.

Though I learnt that encoder-decoder models like BERT are always preferred for these tasks over GPT, I would like to figure out a way better to make use of GPT or LLAMA for this task. I am not successful in generating the concise summary and most times the generated summaries simply mimic or copy the text.

My Steps:

I used <|pos|> and <|sep|> tokens for padding in GPT2 and also used an instructional prefix “Write a concise summary”.
Truncated input and output to contain tokens which are no more than 1024
Used LoRA with r=16, alpha=32 and drop rate as 0.1
Applied cross entropy loss with ignore_index=-100 to mask irrelevant tokens

Could you share some ways in llama or gpt2 as to how effectively one could approach this.

Also are there any interesting loss functions or metrics to look forward beyong Rouge?

Thanks,
Harshavardhana

Topic		Replies	Views
How to train GPT-2 for text summarization? Models	4	9641	November 24, 2024
GPT2 summarization performance 🤗Transformers	3	3135	July 17, 2022
Summarization taks, looking for clarifications before getting started Beginners	10	985	February 16, 2021
Which of the sshleifer/* models can be used as-is for text summarization? Beginners	5	465	July 15, 2020
Training Bert2GPT2 model Summarization doesn't lead to acceptable results Models	0	458	December 8, 2021

Clarification regarding summarization task using LLAMA or GPT2 (Medium,Small)

Related topics