You are attempting to perform batched generation with padding_side=‘right’ this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to call tokenizer.padding_side = 'left'
before tokenizing the input
this is the problem although it set to left
Temp Solution(if “tokenizer.padding_side = ‘left’” didn’t work ):
=============> find ‘transformers/models/qwen2/modeling_qwen2.py’
and change line ~622 or comment
if self.config._attn_implementation == "flash_attention_2":
if attention_mask is not None and past_key_values is not None:
is_padding_right = attention_mask[:, -1].sum().item() != input_tensor.size()[0]
if is_padding_right:
print('I dont gaf')
continue
# raise ValueError(
# "You are attempting to perform batched generation with padding_side='right'"
# " this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to "
# " call `tokenizer.padding_side = 'left'` before tokenizing the input. "
# )
if attention_mask is not None and 0.0 in attention_mask:
return attention_mask
return None