LLaVA multi-image input support for inference

Hey @alzaia ,
Or maybe consider using the Phi 3.5 vision model. Worked great for me:

The code snippets show how to add multiple images.
Best,
Mike