Extracting attention mask from Qwen model

Hi. I’m trying to extract the various attention masks from the output of the Qwen/Qwen2-VL-7B-Instruct model. Here is an overview of what I’m doing:

self.model_id = "Qwen/Qwen2-VL-7B-Instruct"
self.base_model = Qwen2VLForConditionalGeneration.from_pretrained(
    self.model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    low_cpu_mem_usage=True,
    cache_dir=self.cache_dir
)
self.processor = AutoProcessor.from_pretrained(self.model_id)

raw_input = self.processor(
    images=images,
    text=prompts,
    return_tensors='pt',
    padding=True
).to(0, torch.float16)

outputs = list()
raw_outputs = self.base_model.generate(**raw_input, max_new_tokens=200)
for raw_output in raw_outputs:
    outputs.append(self.processor.decode(raw_output, skip_special_tokens=True))
return outputs

How can I alter this to provide attention masks? Thanks.

1 Like

I’ve altered my raw_output code as:

raw_outputs = self.base_model.generate(**raw_input, max_new_tokens=200, output_attentions=store_attention, return_dict_in_generate=True)

Now I just need to work out how to interpret the attention outputs

1 Like