Hello
is there any one that can help me to understand and implement:
microsoft/Phi-3-vision-128k-instruct
into langchain as an agent. I cannot figur it out how to init agent and pass image together with prompt.
Langchain HuggingFacePipeline seems to not implemented such feature yet.
Same with Ollama. I don’t wanna use Azure.
Maybe there is some other possibility to somehow inherent from langchain Agent and create custom class which will invoke as input prompt with message and preprocess this as authors of microsoft/Phi-3-vision-128k-instruct provide as example ?
TY