SDXL custom pipeline - Input to unet? - Why 2 text encoders?

@asrielh no unfortunately I haven’t made any progress on this. Conceptually my understanding for stable-diffusion-v1-4 is that the components are connected like this: tokenizer → text_encoder → unet → vae. I can’t make sense of the two two text encoders & tokenizers in SDXL.