Do_sample defaults to False when I'm trying to set to True

anniedoris · June 10, 2024, 7:07pm

Hi,

I am trying to run inference on the model lmsys/vicuna-7b-v1.5. I load the model and set up the pipeline/prompting as follows:

tokenizer = AutoTokenizer.from_pretrained(model_name)

  model_pipeline = transformers.pipeline(
  "text-generation",
  model=model_name,
  tokenizer = tokenizer,
  torch_dtype=torch.float16,
  max_new_tokens = max_new_toks,
  device_map="auto"
  )

  # There are other traditional LLM inference settings that can be modified here
  sequences = model_pipeline(
          input_prompt,
          do_sample=True
  )

Even though I set do_sample as True, I get numerous warnings that say:

UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.

Am I doing something incorrectly when setting do_sample? When the model is loaded, does it set all inference parameters to those that are in the model’s associated generation_config.json file? How would you recommend resolving this? Thank you!

nielsr · June 10, 2024, 7:17pm

Hi,

Thanks for reporting, that looks like a bug. It might have to do with the generation config on the hub: generation_config.json · lmsys/vicuna-7b-v1.5 at main.

cc @joaogante

RaushanTurganbay · June 11, 2024, 5:16am

Indeed, the warning is printed when loading the generation config from the hub, so can be avoided by modifying config directly on the hub.

The generation in this code snippet is done with sampling, as indicated by do_sample=True

anniedoris · June 13, 2024, 2:34pm

Thanks for the responses. How would I be able to modify the generation config on the hub? It also doesn’t look like the generation config on the hub has any mentions of do_sample?

RaushanTurganbay · June 14, 2024, 5:34am

I think the only way to open a PR in the hub asking the owners to add do_sample=True in the generation config file. Currently it doesn’t have any mention of do_sample and the default value is False in that case

Chahnwoo · June 18, 2024, 12:06am

@anniedoris I think the problem might be how you’re passing the keyword argument to the pipeline. Instead of setting do_sample=True, try { "do_sample" : True }

Topic		Replies	Views
What does model.generate do I'm not? Beginners	2	2456	July 29, 2024
How can I keep use of the base model version for inference after fine-tuning 🤗Transformers	1	93	May 12, 2024
Inference workflow in compile mode using transformers.pipeline() 🤗Transformers	0	32	August 26, 2024
Code makes inference with "Llama 3 70b instruct" model on CPU but has problem with inference with GPUs Beginners	0	1349	April 28, 2024
Ask for help: Output inconsistency when using LLM batch inference compared to single input Beginners	4	161	March 20, 2025

Do_sample defaults to False when I'm trying to set to True

Related topics