A bit late to answer this but this might be due to how you’re not batching your text queries. Also keep in mind it seems that you might’ve hit max text tokens you can pass to OWL which uses CLIP tokenizer.
A bit late to answer this but this might be due to how you’re not batching your text queries. Also keep in mind it seems that you might’ve hit max text tokens you can pass to OWL which uses CLIP tokenizer.