"What’s the Difference Between max_length and max_new_tokens?"

You’re encountering a warning in your code:
UserWarning: Neither max_length nor max_new_tokens has been set, max_length will default to 20 (generation_config.max_length). Controlling max_length via the config is deprecated and max_length will be removed from the config in v5 of Transformers – we recommend using max_new_tokens to control the maximum length of the generation.
To configure this properly, you should specify the max_new_tokens parameter in the model.generate() method. This allows you to control the maximum number of new tokens generated by the model.

Here’s how you can modify your code:
outputs = model.generate(input_ids, max_new_tokens=100) # Set to desired value
In this case, you can adjust max_new_tokens to your preferred limit, similar to how the OpenAI playground defaults to 256 tokens, even though the model can support up to 4000 tokens.

Summary of Changes:

  • Replace max_length with max_new_tokens in your generate method.
  • Adjust max_new_tokens to your desired output length based on your use case.

This should resolve the warning and give you more control over the output length.