Instant ID with another IP Adapter

I want to use an instant ID pipeline with another Ipadapter on SDXL.

I got :

import diffusers
from diffusers.utils import load_image
from diffusers.models import ControlNetModel
from transformers import CLIPVisionModelWithProjection


# Custom diffusers implementation Instantid & insightface 
from insightface.app import FaceAnalysis
from pipeline_stable_diffusion_xl_instantid import StableDiffusionXLInstantIDPipeline, draw_kps

# Other dependencies
import cv2
import torch
import numpy as np
from PIL import Image

from compel import Compel, ReturnedEmbeddingsType


app_face = FaceAnalysis(name='antelopev2', root='./', providers=['CPUExecutionProvider', 'CPUExecutionProvider']) #CUDAExecutionProvider
app_face.prepare(ctx_id=0, det_size=(640, 640))

# prepare models under ./checkpoints
face_adapter = "./models/instantid/ip-adapter.bin"
controlnet_path = "./models/instantid/ControlNetModel/"

# load IdentityNet
controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)
pipe = StableDiffusionXLInstantIDPipeline.from_single_file(
    "./models/checkpoints/realvisxlV40_v40LightningBakedvae.safetensors", 
    controlnet=controlnet, torch_dtype=torch.float16

)
pipe.cuda()

# load adapter
pipe.load_ip_adapter_instantid(face_adapter)

# Load ipadapter
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
    "./models/ipadapters",
    subfolder="sdxl_models/image_encoder",
    torch_dtype=torch.float16,
    #weight_name="ip-adapter-plus_sdxl_vit-h.safetensors"
).to("cuda")

# Apply adapter to pipe
pipe.image_encoder = image_encoder

pipe.load_ip_adapter("./models/ipadapters", subfolder="sdxl_models", weight_name="ip-adapter-plus_sdxl_vit-h.safetensors")
pipe.set_ip_adapter_scale(1.3)

# Optimisation
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()

image = Image.open("img1.png")

face_info = app_face.get(cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR))
face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*(x['bbox'][3]-x['bbox'][1]))[-1]  # only use the maximum face
face_emb = face_info['embedding']


prompt = "prompt"

kps = Image.open("kps_standard.png")

ipadapter_image = Image.open("img2.png")

#encod = pipe.image_encoder(ipadapter_image)


prompt_embed, pooled = compel_proc(prompt)

image = pipe(
    prompt,

    width=768,
    height=1024,

    image_embeds=face_emb,
    image=kps,
    seed=42,
    ip_adapter_image=ipadapter_image,
    
    controlnet_conditioning_scale=0.7,
    control_guidance_end = .7,
    num_inference_steps=6,
    guidance_scale=3,
    
).images[0]

And got :

ValueError: <class 'diffusers.models.unet_2d_condition.UNet2DConditionModel'> has the config param `encoder_hid_dim_type` set to 'ip_image_proj' which requires the keyword argument `image_embeds` to be passed in  `added_conditions`

Is there any workaround ?

I tried adding argument to the pipe

    added_conditions="image_embeds", # first test
    added_cond_kwargs  = { # second one
        "image_embeds" : face_emb
    },

Also I tried to pre-encode the image :

prompt_image_emb = pipe._encode_prompt_image_emb(
    face_emb,
    "cuda",
    num_images_per_prompt = 1,
    dtype=torch.float16,
    do_classifier_free_guidance=False  
)

then :

image = pipe( ...,     encoder_hidden_states= encoder_hidden_states, ...

And it still give me this error.

If anyone has a clue about that

Thanks in advance,

1 Like

The error you’re encountering, ValueError: <class 'diffusers.models.unet_2d_condition.UNet2DConditionModel'> has the config param 'encoder_hid_dim_type' set to 'ip_image_proj', suggests that the UNet2DConditionModel expects the image_embeds to be passed through added_conditions. This implies that you need to correctly pass the image embeddings during inference.

Here are a few things to check and try as potential solutions:

1. Passing added_conditions Correctly

The image_embeds need to be passed under added_conditions when you’re calling the pipeline. You can try modifying your pipeline call like this:

python

Copy code

image = pipe(
    prompt,
    width=768,
    height=1024,
    added_conditions={"image_embeds": face_emb},  # Pass the embeddings here
    image=kps,
    seed=42,
    ip_adapter_image=ipadapter_image,
    controlnet_conditioning_scale=0.7,
    control_guidance_end=0.7,
    num_inference_steps=6,
    guidance_scale=3,
).images[0]

This passes image_embeds as part of the added_conditions dictionary, which is expected by the model.

2. Ensure Correct Embedding Handling

If the face_emb is already an embedding, ensure it’s in the correct format. The face_emb needs to be a tensor with the shape expected by the model (for example, [batch_size, embed_dim]).

python

Copy code

# Ensure face_emb is in the correct shape
face_emb = torch.tensor(face_emb).unsqueeze(0).to("cuda")  # Add batch dimension if needed

3. Using encode_prompt_image_emb

You mentioned trying to use pipe._encode_prompt_image_emb(). While it might not be part of the official API, ensure you’re using it correctly. It should encode the image embeddings, and you can then pass these embeddings as added_conditions:

python

Copy code

prompt_image_emb = pipe._encode_prompt_image_emb(
    face_emb,
    device="cuda",
    num_images_per_prompt=1,
    dtype=torch.float16,
    do_classifier_free_guidance=False  
)

image = pipe(
    prompt,
    width=768,
    height=1024,
    added_conditions={"image_embeds": prompt_image_emb},  # Use the encoded embeddings here
    image=kps,
    seed=42,
    ip_adapter_image=ipadapter_image,
    controlnet_conditioning_scale=0.7,
    control_guidance_end=0.7,
    num_inference_steps=6,
    guidance_scale=3,
).images[0]

Ensure that the embeddings from pipe._encode_prompt_image_emb() are the ones you’re passing into added_conditions.

4. Debugging and Inspecting

If you’re still having trouble, I recommend printing or inspecting the shapes and types of the embeddings at various points to verify that they are being passed correctly. Also, check if any additional parameters are required by the model or pipeline that might have been overlooked.

Lastly, review any available documentation or forum discussions for the pipeline you’re using (e.g., the StableDiffusionXLInstantIDPipeline) to ensure compatibility with the image embedding mechanism.

1 Like

Using the kwargs added_conditions= {"image_embeds": face_emb} still give me errors

if image_embeds=face_emb, is in argument

ValueError: <class 'diffusers.models.unet_2d_condition.UNet2DConditionModel'> has the config param `encoder_hid_dim_type` set to 'ip_image_proj' which requires the keyword argument `image_embeds` to be passed in  `added_conditions`

Without it :

RuntimeError: Could not infer dtype of NoneType

It seems that it need an image_embeds as arguments

I checked and the embedding are passed are correctly

I did try to reshape face embedding within torch tensor and numpy array face_emb_2 = face_emb.reshape(1, 512)
It stil give me the same error.

1 Like