Image Implantation/Manipulation

Hello, I’m new to MI/AI and HuggingFace.

I am a developer by profession with over 20 years of experience, but not in Machine Learning.

I’m embarking on a personal project where I need to programmatically load an image and, using AI, take a person/cat/dog from that image, and seamlessly insert it into another image to replace another person/cat/dog in that second image.

Can someone please point me to a library, area, or tools that I can go and find out how to?

What I’ve found on the internet so far only works on humans and usually don’t have API’s. I’m happy to spend a month or two working on this problem but I need guidance. If someone could be so kind as to point me the right direction, I’ll be grateful

Hey jchukwum
Welcome to the club - I’m in the same boat, long time dev experience, but still pretty fresh to Machine Learning.

What you want to do, is not a single process, it’s multiple different processes chained. For a first “step” in ML this is quite tough one you’ve chosen. It involves many things: Object Detection, image manipulation and - when you mean “seamlessly” by word, it becomes quite a bit complex:

This step could either be done with generative image AI and a process called “Inpaiting” (replacing parts of images through a neuronal network with either new context-specific generated content or with adjusted given content (like the just extracted cat or whatever).

If you could describe the requirements a bit more in details , especially this last heavy step, then i can surely at least point you into the right direction(s).

For the steps before (Object detection for example): You should have a look at OpenCV, torchvision etc. - there’s no need to understand that much about AI to implement that, as there are great pretrained models for basic object detection and classification around, allowing you to just load an already well trained model wih a few lines of code - the rest of the code isn’t a big deal with your experience.

Thank you, ReatKay!

For your quick response.

If you could describe the requirements a bit more in detail

So I’d like to replace the cool dog with the scarf in the picture below with the other dog in the picture with the white background (I’ll post the second image next, I’m only allowed an image per post)

The image with the scarf and glasses was created through Generative AI. But now I want to be able to programmatically create a 3rd image with the shaggy dog replacing the dog with the glasses. It doesn’t have to be a real image it could be generated but it should be of the likeness of the shaggy dog. In a similar clothing, lighting, and background

I know, it’s tricky but I’d like to solve that problem with any dog,

I hope it makes sense. You’ve already given me some areas to look into but see if your answer is still valid based on my new description

Ahhh, now it’s clear…
well, thats a use-case which can be solved through Inpainting or - even more automatic - by using ControlNet (usable with img2img , text2img etc.) on Stable Diffusion - completely, if i’m not absolutely wrong. I’m using Inpainting very limited atm, only to remove watermarks, text, logos to build datasets for image generation, but im using ControlNet daily on generation.

ControlNet allows you for example to mask objects on input file, replacing that object by sihlouette, depth data, pose (through a mechanism called OpenPose) etc. with another object, pretty much exactly matching the sihlouette or the pos.

I woudl suggest installing Automatic11111 - a great web gui for Stable Difussion and then installing the extension “ControlNet” on it. Pretty straight forward thing. And the good: About everything that can be done in the web frontend, is also available als API - allowing you, to write some client application or an automation mechanism to send the data to the API and getting the output back. Im using ControlNet for example to create character drafts, always in the same pose - then using control net to transfer the face (so it’s always the same), so i get the same character posing in the same way just with different hair, color, background etc. - reducing the randomness factor.

Try it out and if you run into issues, feel free to send me a direct message, will try to help with pleasure :slight_smile:

1 Like


Thanks so much, ReatKay.
I’ll look into Automatic1111 and ConrolNet throughout next week. And see I go. I definitely need to use it through an API as I’m building a web front for it.

Thanks again!

What language you’re gonna code on, if I may ask? :slight_smile:

At the moment my platform is using Go, Python, and JavaScript. When I want to spin up threads I prefer using Go, but I use Python for image manipulation like removing the background of images - as it seems to be the language of choice. But when I need speed I reach for Go.

I’m versed in C++, Go, Java, Python, and JavaScript so I’m able to work on what the task demands.

Uhh, Java… been like 20 years since I had to learn it during my apprenticeship. Always more have been on the C-highway - and then later C#. JS was and still is my hate language #1 - but well, since TypeScript, that became a bit better - I favor strong typing. C++ I just had in school during apprenticeship, very limited. Python I basically just started because of deep learning, really coding in python now only for like maybe 2-3 weeks, before I just wrote short training scripts.

I build most stuff on C# , using AspNetCore as web framework, and TypeScript if I have to build something frontend… mostly I just build APIs for my stuff and, thanks to current days, we can automatically create a frontend API client - so thankfully not much JS for me :wink:

I’m not much of a fan of Microsoft products but they did a decent job with C#/.NET. I was scared when I worked with it on WPF for an Order Management System. It was awful. But I cede it’s a WPF issue, not C#. I do know C# is eating Java’s lunch at the moment but there are lots of systems written in Java in my industry.

Be that as it may, I was saved by AngularJS and now React so I binned WPF. I may be the only person in the world not using TS. Again, I’m Microsoft shy. It’s only a matter of time, I suppose.

When I want to build a frontend, I use React/MUI/Next.JS. The backend language of choice is Go. I use Python for fun stuff or when I want an easy life.

I’ve not really done much Java for around 5 years. However, I just had a technical interview on it. And enjoyed coding with it for that.

C++ is just for raw performance. Not friendly, just business. Makes you realise that other languages have stabilisers strapped on. I used it the other day to process millions of lines of historical prices and it ate it in seconds. I was blown away. I’ve dabbled in Rust but don’t have a real need to learn or use it.

Anyway, i can chat about this for hours. I better focus on the work ahead.

Well, depends what kind of thing you do :slight_smile: There are situation where even managed and interpreted languages are sometimes faster than C++ - but in generell, yeah, I agree to you. I was the same with MS - beside their office tools -laugh- but then the big change came… the .net unification and opensourcing, heck, this changed so much :slight_smile: I started to like it! So fuckin’ well performing, such a big framework and for me some of the biggest changes: Native Ahead of Time Compilation. I can - with some limitiations - finally build fuilly assembled console apps but still have the advantages and the simplicity of C# and the many parts of the framework.

Oh heck… WPF… tbh… i tried and failed - but due to being educated job wise in a big retail company, I had years of learning WinForms on the job. In the end, i never saw a real reason to switch to WPF - WinForms is such a mature UI for .NET, with such list of great third party libs and they even pulled WinForms over with the unification. What will be really intresting, how MAUI will develop… looks promising… but well, in the beginning, WPF looked promising too :wink: