Qwen 'padding_side = right' problem

rgb2gbr · April 25, 2025, 3:42pm

You are attempting to perform batched generation with padding_side=‘right’ this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to call tokenizer.padding_side = 'left' before tokenizing the input
this is the problem although it set to left

Temp Solution(if “tokenizer.padding_side = ‘left’” didn’t work ):
=============> find ‘transformers/models/qwen2/modeling_qwen2.py’
and change line ~622 or comment

    if self.config._attn_implementation == "flash_attention_2":
        if attention_mask is not None and past_key_values is not None:
            is_padding_right = attention_mask[:, -1].sum().item() != input_tensor.size()[0]
            if is_padding_right:
                print('I dont gaf')
                continue
                # raise ValueError(
                #     "You are attempting to perform batched generation with padding_side='right'"
                #     " this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to "
                #     " call `tokenizer.padding_side  = 'left'` before tokenizing the input. "
                # )
        if attention_mask is not None and 0.0 in attention_mask:
            return attention_mask
        return None

John6666 · April 25, 2025, 4:04pm

Oh…

github.com/huggingface/transformers

qwen2_5_vl processor padding side is wrong.

opened 03:38AM - 08 Feb 25 UTC

closed 08:04AM - 18 Mar 25 UTC

habaohaba

bug

### System Info ![Image](https://github.com/user-attachments/assets/6ecbc96d-d3…4a-4164-903a-0ef65ea65fb0) ![Image](https://github.com/user-attachments/assets/e92f3446-3e81-4887-9f9c-0b5cb3047683) ![Image](https://github.com/user-attachments/assets/8bef88c1-40ba-413b-8444-d018c9691787) the padding side should be left as qwen2 vl do . ### Information - [ ] The official example scripts - [x] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [x] My own task or dataset (give details below) ### Reproduction run conditional generation using qwen2_5_vl using flash attention 2 . ### Expected behavior ![Image](https://github.com/user-attachments/assets/e92f3446-3e81-4887-9f9c-0b5cb3047683)

You can set the padding side when loading the processor as AutoProcessor.from_pretrained(model_id, padding_side="left"). We usually recommend to make sure padding side is on the correct side, before generating
We might need to update the model docs, if example code for batch generation failed with the same error. Feel free to open a PR if so

rgb2gbr · April 25, 2025, 4:31pm

it works on VL(Text2Image) version not regular version of Qwen

Topic		Replies	Views
The effect of padding_side 🤗Transformers	13	15246	May 27, 2025
Right choice of padding side for Mistral 🤗Tokenizers	0	2100	January 8, 2024
"attention_mask" + `pad_token_id 🤗Transformers	2	5276	June 6, 2024
Padding side in instruction fine-tuning using SFTT 🤗Transformers	1	1561	December 9, 2024
How does padding side affect training? 🤗Transformers	0	246	August 23, 2024

Qwen 'padding_side = right' problem

Related topics