I’m trying to break apart BLIP2 from LAVIS (https://github.com/salesforce/LAVIS/blob/main/lavis/models/blip2_models/modeling_t5.py) which uses HuggingFace PreTrained Model to generate sentence embeddings from T5 model.
So, questions:
- What do I replace this with to get the embeddings: https://github.com/salesforce/LAVIS/blob/e4040b13d6120062829ee9625f016f3cd3dd16e6/lavis/models/blip2_models/blip2_t5.py#L296-L304 ?
- What is passed to decoder_input_ids in Conditional Generation models during generation when there is no input_ids but input_embeds?