Clarification regarding summarization task using LLAMA or GPT2 (Medium,Small)

I am working with fine tuning for the purpose of text summarization purpose. Recent days, I have been experimenting with certain ways to finetune GPT2 for text summarization especially on the datasets like CNN-Daily Mail, XSUM etc.

Though I learnt that encoder-decoder models like BERT are always preferred for these tasks over GPT, I would like to figure out a way better to make use of GPT or LLAMA for this task. I am not successful in generating the concise summary and most times the generated summaries simply mimic or copy the text.

My Steps:

  1. I used <|pos|> and <|sep|> tokens for padding in GPT2 and also used an instructional prefix “Write a concise summary”.

  2. Truncated input and output to contain tokens which are no more than 1024

  3. Used LoRA with r=16, alpha=32 and drop rate as 0.1

  4. Applied cross entropy loss with ignore_index=-100 to mask irrelevant tokens

Could you share some ways in llama or gpt2 as to how effectively one could approach this.

Also are there any interesting loss functions or metrics to look forward beyong Rouge?

Thanks,
Harshavardhana

1 Like