For example, I have input [cls] I love apple [sep] I hate apple[sep]
I may want to make sure that the attention between the second token ‘I’ and the third token ‘love’ always be 0.5 for all of the attention layer during testing (which means there will not be any back propagation). How do I do it? I know this operation sounds unreasonable, but I need to do it for some special purpose. Thank you.