How does the GPT-J inference API work?

Try using “max_length” parameter instead of “max_new_tokens”. The documentation suggests they serve the same purpose and you should not use both simultaneously. Worked for me.

1 Like