Partial modification of the "instruct-pix2pix" model

Hello everyone, I am a master’s student working on my final thesis. I’ve been asked to conduct tests by modifying the text attention function used by the UNet in instruct-pix2pix, specifically by trying out the SwiGLU and ReGLU functions. Unfortunately, I’m not sure how to proceed.

Steps I’ve tried:

Replacing the UNet (I encountered compatibility errors despite using an object of the correct type)
Using the network available on GitHub (unfortunately, without optimizations, I don’t have the necessary resources)
Thank you very much for your help.

P.S. I am open to any suggestions. Unfortunately, I’ve been left to figure things out on my own, as this is one of the first projects of this kind for my professors who are specialized in NLP tasks.