How to set generation parameters for transformers.pipeline?

I can’t figure out the correct way to update the config/ generation config parameters for transformers.pipeline (temperature etc, max_new_tokens, torch_dtype and device_map)

from transformers import pipeline
pipe = pipeline(
    model = hf_model_id,
    temperature = 0.1,

So if I just use the arguments in pipeline, I get UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see

But I try:

from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline
hf_model_id = "eachadea/vicuna-7b-1.1"
model = AutoModelForCausalLM.from_pretrained(hf_model_id)
tokenizer = AutoTokenizer.from_pretrained(hf_model_id, legacy=False)
generation_config, unused_kwargs = GenerationConfig.from_pretrained(
    hf_model_id, max_new_tokens=200, temperature=0.1, return_unused_kwargs=True
model.generation_config = generation_config
pipe = pipeline(


And it doesn’t usually throw the warning, but it uses the default config instead of whatever I put in model.generation_config (for example, instead of 200 max_new_tokens, it just uses 20 max_tokens)

cc @joaogante

I have the same question

I have the same question.

Hi vivio,

Try passing config parameters directly to pipe as kwargs or as:
pipe(prompt, max_new_tokens=200, temperature=0.1)

I believe it should work, look at the example for TextGenerationPipeline at
where “do_samples=True” is being passed to the pipeline instance.

The GenerationConfig.from_pretrained works with a model instance created with AutoModelForCausalLM. Then you can:
model.generate(**inputs, generation_config=generation_config)
With inputs being the prompt but being tokenized first.

Let me know if it worked.