Hi,
I am trying to run inference on the model lmsys/vicuna-7b-v1.5. I load the model and set up the pipeline/prompting as follows:
tokenizer = AutoTokenizer.from_pretrained(model_name)
model_pipeline = transformers.pipeline(
"text-generation",
model=model_name,
tokenizer = tokenizer,
torch_dtype=torch.float16,
max_new_tokens = max_new_toks,
device_map="auto"
)
# There are other traditional LLM inference settings that can be modified here
sequences = model_pipeline(
input_prompt,
do_sample=True
)
Even though I set do_sample as True, I get numerous warnings that say:
UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
Am I doing something incorrectly when setting do_sample? When the model is loaded, does it set all inference parameters to those that are in the model’s associated generation_config.json file? How would you recommend resolving this? Thank you!