How to set 'max_length' properly when using pipeline?

I try to use pipeline, and want to set the maximal length for both tokenizer and the generation process. However, if I try:


prompt = 'What is the answer of 1 + 1?'
pipe = pipeline(
            "text-generation",
            tokenizer=tokenizer,
            model=model,
            do_sample=True,
            truncation=True,
            padding='max_length',
            num_return_sequences=2,
            temperature=1.0,
            num_beams=1,
            max_length=1024,
        )
messages = [
    {"role": "user", "content": prompt},
]
ret = pipe(messages)

I will get an error message:

ValueError: Input length of input_ids is 1024, but `max_length` is set to 1024. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.

Therefore, I set max_new_tokens according to the error message as follow:

prompt = 'What is the answer of 1 + 1?'
pipe = pipeline(
            "text-generation",
            tokenizer=tokenizer,
            model=model,
            do_sample=True,
            truncation=True,
            padding='max_length',
            num_return_sequences=2,
            temperature=1.0,
            num_beams=1,
            max_length=1024,
            max_new_tokens=512,
        )
messages = [
    {"role": "user", "content": prompt},
]
ret = pipe(messages)

However, I will get a warning all the time:

Both `max_new_tokens` (=512) and `max_length`(=1024) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)`.

I wonder what would be the correct way to set max_length parameter for both tokenizer and the momdel.generate(…)? Should I use alternative arguments in pipeline?

1 Like

If you are not familiar with the LLM options, it works better to use the defaults. It will work better if you use them when you really need them.
It is faster and more reliable to truncate the output string to 1024 characters in Python than to limit the number of characters in the LLM output.

If I understand correctly, you do not recommend to use pipeline?

1 Like

No, I recommend it because the pipeline is a value set that can do a lot of work on your behalf, unless you need to make detailed adjustments.
However, adjusting the number of output characters using max_length doesn’t work very well, so it’s faster to output more and then trim it using Python string processing, rather than specifying it.

@jiaweihuang

prompt = 'What is the answer of 1 + 1?'
pipe = pipeline(
            "text-generation",
            tokenizer=tokenizer,
            model=model,
            do_sample=True,
            truncation=True,
            padding='max_length',
            num_return_sequences=2,
            temperature=1.0,
            num_beams=1,
            max_length=1024,
            max_new_tokens=512,
        )
messages = [
    {"role": "user", "content": prompt},
]
-ret = pipe(messages)
+ret = pipe(messages, max_length= 1024)

also pretty sure all of these prameters are used in the generation and not in the initialization

1 Like