How to set 'max_length' properly when using pipeline?

jiaweihuang · November 18, 2024, 4:06pm

I try to use pipeline, and want to set the maximal length for both tokenizer and the generation process. However, if I try:


prompt = 'What is the answer of 1 + 1?'
pipe = pipeline(
            "text-generation",
            tokenizer=tokenizer,
            model=model,
            do_sample=True,
            truncation=True,
            padding='max_length',
            num_return_sequences=2,
            temperature=1.0,
            num_beams=1,
            max_length=1024,
        )
messages = [
    {"role": "user", "content": prompt},
]
ret = pipe(messages)

I will get an error message:

ValueError: Input length of input_ids is 1024, but `max_length` is set to 1024. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.

Therefore, I set max_new_tokens according to the error message as follow:

prompt = 'What is the answer of 1 + 1?'
pipe = pipeline(
            "text-generation",
            tokenizer=tokenizer,
            model=model,
            do_sample=True,
            truncation=True,
            padding='max_length',
            num_return_sequences=2,
            temperature=1.0,
            num_beams=1,
            max_length=1024,
            max_new_tokens=512,
        )
messages = [
    {"role": "user", "content": prompt},
]
ret = pipe(messages)

However, I will get a warning all the time:

Both `max_new_tokens` (=512) and `max_length`(=1024) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)`.

I wonder what would be the correct way to set max_length parameter for both tokenizer and the momdel.generate(…)? Should I use alternative arguments in pipeline?

John6666 · November 18, 2024, 4:12pm

If you are not familiar with the LLM options, it works better to use the defaults. It will work better if you use them when you really need them.
It is faster and more reliable to truncate the output string to 1024 characters in Python than to limit the number of characters in the LLM output.

jiaweihuang · November 18, 2024, 4:37pm

If I understand correctly, you do not recommend to use pipeline?

John6666 · November 18, 2024, 5:07pm

No, I recommend it because the pipeline is a value set that can do a lot of work on your behalf, unless you need to make detailed adjustments.
However, adjusting the number of output characters using max_length doesn’t work very well, so it’s faster to output more and then trim it using Python string processing, rather than specifying it.

not-lain · November 18, 2024, 11:23pm

@jiaweihuang

prompt = 'What is the answer of 1 + 1?'
pipe = pipeline(
            "text-generation",
            tokenizer=tokenizer,
            model=model,
            do_sample=True,
            truncation=True,
            padding='max_length',
            num_return_sequences=2,
            temperature=1.0,
            num_beams=1,
            max_length=1024,
            max_new_tokens=512,
        )
messages = [
    {"role": "user", "content": prompt},
]
-ret = pipe(messages)
+ret = pipe(messages, max_length= 1024)

also pretty sure all of these prameters are used in the generation and not in the initialization

Topic		Replies	Views
Pipeline max_length 🤗Transformers	2	3912	February 23, 2024
How to change max_length of a fine tuned model 🤗Transformers	4	11465	May 11, 2024
Confused about max_length and max_new_tokens 🤗Transformers	7	36502	September 5, 2024
Issue with max_length 🤗Transformers	1	2470	September 27, 2020
Both `max_new_tokens` and `max_length` have been set but they serve the same purpose 🤗Transformers	0	1635	February 2, 2023

How to set 'max_length' properly when using pipeline?

Related topics