I want to use an AI image tool to create images that show certain items. To do this, I want to use a reference image that has a transparent background.
I have already tried three models (including Midjourney), but although the images themselves are good, the difference to the reference image is enormous. If, for example, I need a Sherman tank for my game, it cannot really be changed. Since the models do not seem to know the items (in this case a tank), some different tanks always comes out, they have nothing to do with historical correct ones.
The ideal prompt would be that I end up with a definable background for exactly the item that is shown in the reference image. What would be the best solution for this, since the
existing models are apparently not capable of this?
One way is to choose any of the existing excellent models and teach them Sherman tanks, etc., but it is expensive and time consuming…
So how about using inpainting?
The AI will re-draw an image based on a reference image.
There are also many similar functions to this. You can do more advanced things with ControlNet, for example.
If you want to focus on detail, there is a way to create a LoRA instead of radically retraining an existing model. This is inexpensive.
I created an image with an AI. The task was to add a tiger tank, but it did not.
When I tell the AI to replace it, it can easily replace the tank, but not with that one I wanted. Even when I add an example with a tiger tank and transparent background, it cant do that. So I guess training a model or LoRA would be the best solution, but I need to check your links further…
As for LoRA we can think of it as little more than a convertible equipment pack.
It is possible to create a LoRA and then have it inpainted on the model to which the LoRA is applied.
You can use as many LoRAs as you want at the same time, so you can have one LoRA for the background, one for the tanks, and one for the people, and so on. If the weapon is well-known, there may already be a LoRA in Civitai.
There are many tricks to making LoRAs, but there is so much know-how accumulated that you don’t need to do it by trial and error.
You can find a know-how site that suits you by searching the following clues.
I used the reference image with the transparent background, I think you can see it quite well. The tank itself tends to be way too big in scale, but I will try to handle this with making it smaller in the reference image.
This is what I wanted, but I need to say the RealVisXL V5.0 Lightning model is a little bit limitated. But maybe this is exactly why it works
There is a collection of models and inpainting relatives (image-to-image) available.
You can find most of the existing models and LoRA for free on Civitai, HF, and anonymous forums on the net, so you can start by looking through these and then make what you are missing.
You can test your base model at HF for free.
Also, HF may have a dataset of images useful for training. Too many to find, though.
Thanks for your help, it is impressive. For now let me test the inpaint options you showed me in your first answer. The LoRa is a complete different thing if I understood you correctly, so let me finish these tests first, then I will check the rest.
Honestly, it is a lot you presented to me so it takes some time.
If the inpaint option would generate larger images and a better understanding of scale I would be happy already
As for inpainting, it depends on the composition of the original image, so if you dare not use a transparent background and just make a composition with people added, they will make it accordingly. Even if it is a doodle, they will re-draw it rather nicely.
I think you should first look for a space suitable for inpainting for yourself. There are many…
Okay, for now I found out this:
The Diffusers Fast Inpaint - a Hugging Face Space by OzzyGT just works fine, because i can upload
the reference image without masking it. All others that I have found wants me to mask
those parts I want to replace and they throw an error on me if I dont want to do this.
The reason is, that I dont want to replace anything of the reference image and
it is hard to mask white on a white background in the web tool.
So, because this tool does not throws an error on me, I can create a background. This fits the reference image quite nice, but it has some weird understanding of scale. One downpart is the low resolution. I am not sure if I can manage to create my own copy for my own needs. Anyway, this is the best option I found so far, compared to the other inpaint links you gave to me in your very first posts and to other inpaint tools in the web.
In short, I need to use this one, make my own very similar solution or train with LoRA.
If RealVis 5XL is good enough for you, you can choose a base model from over 1000 (modestly speaking) SDXL models. There is also simply a new version of RealVis.
The means from here diverge into several, but the HF space is copy free, so you can copy and customize it for your own use. Don’t know how to program? Ask someone who understands it and you’re done. I’m easy to teach you mod too.
If you want to copy and use the space that says ZeroGPU, just note that it’s $10 per month and 10 slots per person.
CPU space is free.
You mean if I create my own one / copy one and modify it, I can choose between different ones, correct?
I think it is worth the time to take a deep look into it, but I need to start from scratch.
My level of understanding is, that I can use Gradio, just as the developer of the tool I linked above did. But as I said, I need to take a more deep look into it.
You mean if I create my own one / copy one and modify it
Good morning.
That’s what I mean. If you can tinker with Gradio or Python then the story is simple.
It’s just a str or list that specifies the model in the space above, so as long as you discover an alternative model, you can use a different model with a few lines of changes.
RealVisXL belongs to the SDXL architecture, which is the most popular model nowadays, so you can follow the link below for alternative model candidates. For the most part, it will work with only a rewrite of the code.
Furthermore, in the HF system, there is a function to combine LoRA and models as if they were a single repo, which allows you to run a single LoRA as a model with LoRA applied, with almost no code. To use this function, you just need to create a new model repo, upload one LoRA file, and write a few lines of README.md (actually, a YAML configuration file).
By copying and modifying the above space for your own use, you can easily add or tweak functions, change models, and apply LoRA.
Image-to-Image, including Inpaint, actually allows you to specify a wide variety of parameters internally, so you can often specify, for example, the fidelity to the original image.
This approach of creating a repo, placing what you need, and using it from copied Spaces is one of the most common ways to use HF. The same thing can be done with language models, for example.