Hey everyone, could you help me find the best open weights model stack that could allow me to reliably achieve from text and/or some reference images of an object, the “views” to generate a octahedral impostor of said object. For example: “maple tree” + image(s) –> octahedral impostor atlas/views.
Is there a direct way to get all the necessary views from text/image as input?
Or should I try to generate a 3D model and then manually extract the views I need? I feel like this is the only way that really allows me to guarantee exactly the relatively spaced perspectives that I need. At the same time, I feel like the current state of the art quality of text/image to image is way higher than text/image to 3D
.
I did some research and I didn’t find much. I’m gonna start experimenting soon, I was just wondering if you HF experts could share your gut feelings. Maybe it turns out that I can just use the first half of the forward pass of a recent model that does something more complex
, you tell me. In any case, I don’t think this project will require that complex of a plumbing, I’m just looking for better/faster solutions if you have anything in mind.
Thanks in advance