LLaVA multi-image input support for inference

nielsr · August 30, 2024, 10:23am

Indeed, and another model which just got released which supports few-shots is Qwen2-VL, which is integrated natively in the Transformers library. See this tweet regarding sample usage.

Note that it’s a fast moving field so in about 2 weeks this model will again be surpassed by another one.

Topic		Replies	Views
Multimodal LLM with Image and Text sequentially in its prompt 🤗Transformers	2	12453	January 1, 2024
Turning a LLaMA model into a LLaVA Beginners	0	90	June 24, 2024
Looking information on the training set used in LLaVA Beginners	0	11	July 24, 2024
ValueError: Image features and image tokens do not match 🤗Transformers	2	2124	April 14, 2025
Error making predictions using LMM (LLaVA) model on multiple GPUs Intermediate	0	542	March 27, 2024

LLaVA multi-image input support for inference

Related topics