with torch.no_grad(): output_ids = model.generate( input_ids=input_ids, images=image_tensor, max_new_tokens=256, do_sample=True ) leads to attribute error none type object has no shape
print(“Prompt:”, prompt) print(“Type of input_ids:”, input_ids.dtype) print(“Shape of input_ids before generate:”, input_ids.shape) print(“Shape of image_tensor:”, image_tensor.shape) print(“Type of image_tensor:”, image_tensor.dtype) all print statement giving correct.could anyone suggest what leads to error.am doing inference check after finetuned multimodal model
1 Like
You’re probably passing images=image_tensor
into a model that doesn’t expect or properly handle that argument. Inside the generate()
method, HuggingFace or custom model logic might be expecting the images
argument to be processed by a prepare_inputs_for_generation()
or forward()
call — and if not correctly implemented, it returns None
, causing the error when .shape
is accessed.
Side possibility, does the model support multimodality?
For example, a typical AutoModelForCausalLM
does not support an images
parameter.
Try this
output_ids = model.generate(input_ids=input_ids, max_new_tokens=256, do_sample=True)
If this works without error, your model or its generation path is not correctly set up to handle multimodal input.
Leave a like if this helped you at all 
Fix:
You are passing images=image_tensor to model.generate, but your model or generation config likely does not support the images argument—or your finetuned model doesn’t handle multimodal input as expected.
Direct script correction:
Try this: Remove images if the model doesn’t support multimodal inference
output_ids = model.generate(
input_ids=input_ids,
max_new_tokens=256,
do_sample=True
)
If your model does support images, make sure image_tensor is not None and is properly preprocessed. Otherwise, the error means image_tensor is None when accessed inside the generate function.
Check:
assert image_tensor is not None, “image_tensor is None”
Solution provided by Triskel Data Deterministic AI.
1 Like
liuhaotian/llava-v1.5-7b this is my base model i have fine tuned and pushed in huggingface.am giving input both image and text for inference check.please guide me
1 Like
If you want to use images, it seems that you need to pass the pixel_values
argument instead of the images
argument for the LLaVa model.
https://stackoverflow.com/questions/1109422/getting-list-of-pixel-values-from-pil