How to generate multiple text completions per prompt (like vLLM) using HuggingFace Transformers Pipeline without triggering an error?

brando · May 12, 2024, 2:17am

error:

error: ` Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.`

fix:

def pipeline_tests_():
    print(f'\n--> pipeline_tests_()')
    import torch
    from transformers import pipeline

    # pipe = pipeline(model="gpt2", device_map="auto", model_kwargs={"load_in_8bit": True})
    pipe = pipeline(model="gpt2", device_map="auto", model_kwargs={"load_in_4bit": True})

    # output = pipe("This is a cool example!", do_sample=True, top_p=0.95, temperature=0.8, max_length=50)
    output = pipe("This is a cool example!", do_sample=True, top_p=0.95, temperature=0.8, max_length=50, truncation=True)
    print(f'\n{output=}')
    print(f'{len(output)=}')

    output = pipe("This is a cool example!", do_sample=True, top_p=0.95, temperature=0.8, max_length=50, num_return_sequences=4, truncation=True)
    print(f'\n{output=}')
    print(f'{len(output)=}')

    output = pipe("This is a cool example!", do_sample=False, top_p=0.95, temperature=0.8, max_length=50, num_return_sequences=4, num_beams=5, truncation=True)
    print(f'\n{output=}')
    print(f'{len(output)=}')

    print()

Topic		Replies	Views
Conversational pipeline by huggingface transformer taking too long to generate output 🤗Transformers	0	842	September 27, 2023
Pipeline vs model.generate() Beginners	11	14110	July 16, 2025
Handle long generation in text generation pipeline 🤗Transformers	0	509	June 16, 2023
Get text generation in particular format Research	5	312	April 18, 2024
Text generation using LLAMA3 Beginners	0	845	July 24, 2024

How to generate multiple text completions per prompt (like vLLM) using HuggingFace Transformers Pipeline without triggering an error?

Related topics