Hi. I’m trying to extract the various attention masks from the output of the Qwen/Qwen2-VL-7B-Instruct model. Here is an overview of what I’m doing:
self.model_id = "Qwen/Qwen2-VL-7B-Instruct"
self.base_model = Qwen2VLForConditionalGeneration.from_pretrained(
self.model_id,
torch_dtype=torch.float16,
device_map="auto",
low_cpu_mem_usage=True,
cache_dir=self.cache_dir
)
self.processor = AutoProcessor.from_pretrained(self.model_id)
raw_input = self.processor(
images=images,
text=prompts,
return_tensors='pt',
padding=True
).to(0, torch.float16)
outputs = list()
raw_outputs = self.base_model.generate(**raw_input, max_new_tokens=200)
for raw_output in raw_outputs:
outputs.append(self.processor.decode(raw_output, skip_special_tokens=True))
return outputs
How can I alter this to provide attention masks? Thanks.