Keyword generation using T5

JulesBelveze · May 6, 2022, 8:54am

Hey folks,

I am trying to fine-tune a T5 for some sort of keyword generation.

Due to the way I’ve created my dataset (extracting keywords from a summary of the actual text) the gold keywords that I have might not be present in the actual text. For this reason a token classification task would not work. Thus my idea of text to text generation !

Ideally this would work like:

>>> input_text = "question: What are the keywords? context: [DOCUMENT]"
>>> inputs = tokenizer(input_text)
>>> outputs = model.generate(**inputs)
>>> print(tokenizer.decode(outputs))
finance - crisis - stocks

My main concern is that LM are actually trained to generate syntactically correct sentence. Would this be problematic?
Has anyone tried to perform a similar task or has any suggestion on how I could tackle this problem from a different angle?

Cheers!

ahadda5 · July 14, 2022, 11:14am

Interested to know how this one is coming along?

iratxeMoya · September 12, 2022, 9:04am

What model are you using? QA model or Seq2seqLM model? I’m just trying to do something similar…

JulesBelveze · September 12, 2022, 9:21am

I actually tried both and none led to promising result (I didn’t really dug much tbh)

loghai · November 2, 2022, 4:41am

Hi, so text2text is not returning promising results, should I try token classification if my keywords are present in the context?

Topic		Replies	Views
How to generate text with T5Model other than T5ForConditionalGeneration? 🤗Transformers	0	300	September 22, 2022
Can t5 be used to text-generation? Beginners	7	8816	April 26, 2023
T5 for conditional generation: getting started Beginners	20	18630	July 19, 2023
Multiple-Token Input for Text Generations and PPLM? Beginners	13	2516	November 16, 2020
Generate desired text output based on model training Intermediate	3	307	December 17, 2024

Keyword generation using T5

Related topics