Vil

Vil

Vision Language Models, Spatially aware VLMs, VLM grounding, Model compression, Language - Vision Representation