Whenever I am generating text the input is included in the output. When the input is close to the maximum length the model barely produces any useful output.
Information
When using transformers.pipeline or transformers.from_pretrianed, the model is only generating the input, when the input is long. For example,
generator = transformers.pipeline('text-generation', model='gpt2')
prompt = "really long text that is 1023 tokens ..."
output = generator(prompt, mex_length=1024, do_sample=True, temperature=0.9)
output in this case would be equal to the input prompt.
To Reproduce
Here is a Collab notebook with simple examples of the problem. I am looking to generate output from input ~1300 tokens and running into this issue consistently. Is there a way around this?