Projected gradient descent on autoregressive models

I am doing text summarization along with a trained classifier (that gives a label to a outputted summarization), and I would like to find how far away certain classifier labels are from each other by using some adversarial attacks and visualizing it for summarizer’s encoder embeddings. Is there any part of the huggingface library focusing on doing i.e. projected gradient descent on (autoregressive) decoder to encoder embedding?