I tool for transformer prediction explaination, but how do I interpret the various layers/heads etc? It looks to me like it is mainly used for explaining attention, vs how it influenced predictions?
1 Like