FlavaModel multimodal_embeddings shape and text_embeddings shape is not match

hi, i want to use query to search images with title, can i use multimodal_embeddings and text embbedings for search system?
but the shape of multimodal_embeddings is (275, 768), and the shape of text_embeddings is (77, 768), how to apply this model for image recall?

1 Like