Is it possible to generate GPT2 output without an input prompt text

farazk86 · March 10, 2021, 9:36pm

Hi,

So as the title says, I want to generate text without using any prompt text, just based on what the model learned from the training dataset. I tried by giving a single space as the input prompt but it did not work.

So I tried below:

prompt_text = ' '

encoded_prompt = tokenizer.encode(prompt_text, add_special_tokens=False, return_tensors="pt")
output_sequences = model.generate(
    input_ids=encoded_prompt,
    max_length=50 + len(encoded_prompt[0]),
    temperature=0.7,
    top_k=0,
    top_p=0.9,
    repetition_penalty=1.0,
    do_sample=True,
    num_return_sequences=5,
)

and got the error:

RuntimeError: cannot reshape tensor of 0 elements into shape [-1, 0] because the unspecified dimension size -1 can be any value and is ambiguous

Thanks

neuralpat · March 11, 2021, 7:03am

farazk86 · March 11, 2021, 11:26am

Thanks, so will this be the correct input:

prompt_text = '<|startoftext|> '

encoded_prompt = tokenizer.encode(prompt_text, add_special_tokens=False, return_tensors="pt")
output_sequences = model.generate(
    input_ids=encoded_prompt,
    max_length=50 + len(encoded_prompt[0]),
    temperature=0.7,
    top_k=0,
    top_p=0.9,
    repetition_penalty=1.0,
    do_sample=True,
    num_return_sequences=5,
)

Because doing this gives me a lot of other random tokens in some of the generated outputs such as:

<|startoftext|>cringe|<end of text|>cringe|<|last text|>cringe|<|end of text|>cringe|<|last text|>cringe|<|last text|>cringe|

and

<|startoftext|> 6 years old; i remember this well, remember its a good thing when you’re old enough to remember that there’s something bette

The outputs are of much poor quality than when I provide an input prompt text??

neuralpat · March 11, 2021, 11:31am

I’ve never had this problem. Did you run the fine-tuning again with the now wraped samples?

farazk86 · March 13, 2021, 12:44pm

No, I didn’t do that. Would I have to re-run the full training? and change the dataset where every sentence starts with this token?

neuralpat · March 14, 2021, 12:10pm

You don’t have to re-run the entire training, just your fine-tuning.
Yes, you’d have to change the dataset your training with to include those tokens.

Topic		Replies	Views
How to generate without decoding? 🤗Transformers	1	369	December 13, 2023
Training GPT2 From Scratch in TensorFlow (TFGPT2) with generators Beginners	1	794	May 14, 2022
What is the correct format of input when fine-tuning GPT2 for text generation with batch input? Models	0	507	January 22, 2024
Text Generation using GPT2 Beginners	3	702	April 26, 2024
How to remove input from from generated text in GPTNeo? 🤗Transformers	0	987	March 1, 2022

Is it possible to generate GPT2 output without an input prompt text

Related topics