Solution for Graph to Multiple sentences / Paragraph generation

sheoran95 · March 15, 2023, 2:17am

Hey! I’m a beginner in NLP and have got a task at hand where I have to generate paragraph from graphs. eg: if there are 5 sentences in a paragraph, I have 5 linearized graphs for them which will be my input to a model and will generate a paragraph.
The linearized graphs are strings and I plan to use a sequence2sequence transformer for this Graph2Text problem.

I tried different approaches:
Approach1:
I tried to approach it by just concatenating all the linearize graphs (i.e. one single string containing info about all graphs) in the right order and then pass it to the pre-trained T5 model (also tried BART) but it generates only 20 words even if the paragraph has like 40-50 words. I’m not sure why it is not working as it works really well for 1 graph to 1 sentence generation.

Approach2:
Taking the encoder and decoder of a pretrained model. Suppose each paragraph has 5 sentences, I will have 5 graphs. I pass each graph through the encoder and get some representation for each graph.
I pass these representations to a Bi-LSTM so that it remembers the order of the sentences and then pass to the decoder for paragraph generation. The problem is that the encoder output is not just a simple tensor but a class object and I’m not sure now how to integrate this Bi-LSTM layer in between.
Also, do you think Approach2 will have similar problem of generating 20 words like Approach1 because the pretrained models are the same and we are just adding a BI-LSTM in between.

I would really appreciate any suggestions for the above approaches or other approaches to tackle the problem.

@sachin

sheoran95 · March 16, 2023, 3:22am

I got a solution for the first Approach. Since I was using the Trainer API of huggingface for T5 and BART model, I just needed to provide generation_max_length=1024 to the trainer args.

eg:

training_args = Seq2SeqTrainingArguments(
    output_dir="my_model",
    evaluation_strategy="epoch",
    learning_rate=3e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=3,
    predict_with_generate=True,
    fp16=True,
    push_to_hub=True,
    generation_max_length=1024,
    generation_num_beams=4
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

This did the job for me and I can generate paragraphs now.

Topic		Replies	Views
Pretrained T-5 small model is only generating limited number of words 🤗Transformers	1	278	March 16, 2023
Generate sentences from keywords only Beginners	4	3013	November 26, 2021
Keyword generation using T5 Models	4	1979	November 2, 2022
Multiple-Token Input for Text Generations and PPLM? Beginners	13	2511	November 16, 2020
Train T5/BART to convert a string into multiple strings 🤗Transformers	1	1676	December 10, 2022

Solution for Graph to Multiple sentences / Paragraph generation

Related topics