Controlled Text Generation

I am trying to perform in context learning with GPT-Neo and I have noticed that it’s hard to get the text generation pipeline to just complete a single line. More specifically, suppose I have the following prompt:

Give a complement about a topic: 

Topic: Soccer
Complement: You are so good at soccer

Topic: Cooking
Complement: I love your cooking 

Topic: Public Speaking
Complement: 

Now when I ask GPT to continue using the text generation pipeline, I often get much more than the “complement” line completed. If the max length is large enough, it will go on to make up it’s own Topics and Complements. Is there any way to make it stop after completing a single line?

Thanks!

Hey :wave:

In the generate() method, you can define the EOS token id ( eos_token_id), I guess that if you set this parameter with the id of \n, it would generate only a single line.

But I’m not sure there is this parameter in the TextGenerationPipeline, workaround in this case would be to use regex after the generation to get everything before the first \n.

1 Like

Hi @josephgatto,

you can refer to this paper which addresses the problem.

Generating Datasets with Pretrained Language Models (EMNLP 2021)

They used an artificial eos_token (the end quotation) which is similar to what @YannAgora mentioned.

However, the model may not always generate the eos_token as you wish, especially if it’s not fine-tuned. In this case, you can just discard the sequence if it doesn’t generate the eos_token within max_len.