In my current project, I am working on training encoder-decoder models (BART, T5, etc.) and the Transformers library has been absolutely invaluable! After seeing several Bertology analyses (i.e. looking at the information the model’s attention mechanism learns to attend to), I would like to know if a similar analysis is possible with the BART and T5 models in the Hugging Face library. Any recommendations are certainly appreciated!