Thanks! This is what I was expecting. I saw the same kind of answers from the authors on their github as well. I guess we will need to wait for LLaVA 2.0 for this (LLaVA 1.6 just came out but I do not think it was trained on multi-image).
Thanks! This is what I was expecting. I saw the same kind of answers from the authors on their github as well. I guess we will need to wait for LLaVA 2.0 for this (LLaVA 1.6 just came out but I do not think it was trained on multi-image).