I’m trying to figure out which Serverless Inference APIs support output of attention matrix data. I know this is determined by the “output_attentions= true”, but I can’t see whether this is enabled for an API/model or not.
Is there a way I can search/filter based on models that have attention outputs pre-configured? Or can you recommend some serverless inference APIs that do support attention outputs?
Thanks!