Controlled Text Generation

josephgatto · March 15, 2022, 1:32pm

I am trying to perform in context learning with GPT-Neo and I have noticed that it’s hard to get the text generation pipeline to just complete a single line. More specifically, suppose I have the following prompt:

Give a complement about a topic: 

Topic: Soccer
Complement: You are so good at soccer

Topic: Cooking
Complement: I love your cooking 

Topic: Public Speaking
Complement:

Now when I ask GPT to continue using the text generation pipeline, I often get much more than the “complement” line completed. If the max length is large enough, it will go on to make up it’s own Topics and Complements. Is there any way to make it stop after completing a single line?

Thanks!

YannAgora · March 15, 2022, 2:03pm

Hey

In the generate() method, you can define the EOS token id ( eos_token_id), I guess that if you set this parameter with the id of \n, it would generate only a single line.

But I’m not sure there is this parameter in the TextGenerationPipeline, workaround in this case would be to use regex after the generation to get everything before the first \n.

Yiping · March 26, 2022, 7:17am

Hi @josephgatto,

you can refer to this paper which addresses the problem.

Generating Datasets with Pretrained Language Models (EMNLP 2021)

They used an artificial eos_token (the end quotation) which is similar to what @YannAgora mentioned.

However, the model may not always generate the eos_token as you wish, especially if it’s not fine-tuned. In this case, you can just discard the sequence if it doesn’t generate the eos_token within max_len.

Topic		Replies	Views
How does GPT decide to stop generating sentences without EOS token? 🤗Transformers	13	24260	August 19, 2024
Ensure the sentence is complete during generation 🤗Transformers	5	7046	December 19, 2024
GPT2 finetuned with eos token will never yield eos token during generation Beginners	6	3360	April 12, 2024
How to insert a end-sequence Beginners	4	2829	March 22, 2022
How does the text-generation pipeline know the special stop token? Beginners	8	3208	June 10, 2024

Controlled Text Generation

Related topics