I ran with flash_attention_2. Here are outputs with return_dict_in_generate
on and off:
Off:
Loading checkpoint shards: 100%|████████████████| 2/2 [00:03<00:00, 1.60
Special tokens have been added in the vocabulary, make sure the associate
d embeddings are fine-tuned or trained.
/opt/conda/lib/python3.10/site-packages/transformers/generation/configura
utils.py:515: UserWarning: `do_sample` is set to `False`. However, `tempe
e` is set to `0.0` -- this flag is only used in sample-based generation m
You should set `do_sample=True` or unset `temperature`.
warnings.warn(
[
{'role': 'user', 'content': 'Hello how are you?'},
{
'role': 'assistant',
'content': " Hello! I'm doing well. How about you? How can I help
today?"
}
]
On:
Loading checkpoint shards: 100%|█████████████| 2/2 [00:03<00:00, 1.65s/it]
Special tokens have been added in the vocabulary, make sure the associated
word embeddings are fine-tuned or trained.
/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py:12
83: UserWarning: You have modified the pretrained model configuration to co
ntrol generation. This is a deprecated strategy to control generation and w
ill be removed soon, in a future version. Please use and modify the model g
eneration configuration (see https://huggingface.co/docs/transformers/gener
ation_strategies#default-text-generation-configuration )
warnings.warn(
/opt/conda/lib/python3.10/site-packages/transformers/generation/configurati
on_utils.py:515: UserWarning: `do_sample` is set to `False`. However, `temp
erature` is set to `0.0` -- this flag is only used in sample-based generati
on modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
[
{'role': 'user', 'content': 'Hello how are you?'},
{
'role': 'assistant',
'content': " Hello! I'm doing well. How about you? How can I help
you today? Hello! I'm just a computer program, but I'm functioning
optimally. Thank you for asking! How can I assist you?"
}
]
The number of tokens in generated sequence are lengthier always with return_dict_in_generate
set to True