Whenever I am generating text the input is included in the output. When the input is close to the maximum length the model barely produces any useful output.
When using transformers.pipeline or transformers.from_pretrianed, the model is only generating the input, when the input is long. For example,
generator = transformers.pipeline('text-generation', model='gpt2')
prompt = "really long text that is 1023 tokens ..."
output = generator(prompt, mex_length=1024, do_sample=True, temperature=0.9)
output in this case would be equal to the input prompt.
Here is a Collab notebook with simple examples of the problem. I am looking to generate output from input ~1300 tokens and running into this issue consistently. Is there a way around this?