I’m currently working on a generation task using transformers and have encountered an issue. I’m using the NSQL-2B model, and the generation configuration is set as follows:
I got the following warning when generating the text:
This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.
Followed by this error:
index 2048 is out of bounds for dimension 0 with size 2048)
The model max length is : 2048
I’m looking for advice or solutions on how to handle this. Should I adjust the max_new_tokens parameter, or is there another way to manage this limitation without compromising the generation quality? Any insights or experiences with similar issues would be greatly appreciated.
I have the same issue, too. In fact, you define the max_new_token to be 2048 is the key point that why you have this issue. When you call output = model.generate(**input,…) the output actually includes the input, which will be count into the overall length. For example, you input is “what is your name”, which contains 4 token and 2 special token (bos or eos), these 6 tokens plus your max_new_token 2048 would be 2054(2048 + 6). As the length of output of your model is limited, the exceeded tokens would result in this issue and throw some errors. I advice you lowering the max_new_token to 1024 and most important making sure that your input token would not exceed 1024 if your model’s max token is 2048. For me, I define two parameters:
model_max_length It restricts the input length to 1024 for example.
max_new_token (this parameter is in model.generate) It restrict the output length to 1024.
The overall length would not exceed the maximum token of your model. Like for opt1.3b: 2048 llama: 4096 …
I hope my experience can help you solve the issue.