First off, the zero-shot module is amazing. It wraps up a lot of boiler-plate that I’ve been using into a nice succinct interface. With that however, I’m having trouble getting the gradients of intermediate layers. Let’s take an example:
from transformers import pipeline
import torch
model_name = 'facebook/bart-large-mnli'
nlp = pipeline("zero-shot-classification", model=model_name)
responses = ["I'm having a great day!!"]
hypothesis_template = 'This person feels {}'
candidate_labels = ['happy', 'sad']
nlp(responses, candidate_labels, hypothesis_template=hypothesis_template)
This works well! The output is:
{'sequence': "I'm having a great day!!",
'labels': ['happy', 'sad'],
'scores': [0.9989933371543884, 0.0010066736722365022]}
What I’d like to do however, is look at the gradients of the input tokens to see which tokens are important. This is in contrast to looking at the attention heads (which is also another viable tactic). Trying to rip apart the internals of the module, I can get the logics and embedding layers:
inputs = nlp._parse_and_tokenize(responses, candidate_labels, hypothesis_template)
predictions = nlp.model(**inputs, return_dict=True, output_hidden_states=True)
predictions['logits']
tensor([[-3.1864, -0.0714, 3.2625],
[ 4.5919, -1.9473, -3.6376]], grad_fn=<AddmmBackward>)
This is expected, as the label for “happy” is index 0 and the entailment index for this model is 2, so the value of 3.2625 is an extremely strong signal. The label for “sad” is 1 and the contradiction index is 0, so the value of 4.5919 is also the correct answer.
Great! Now I should be able to look at the first embedding layer and check out the gradient with respect to the happy entailment scalar:
layer = predictions['encoder_hidden_states'][0]
layer.retain_grad()
predictions['logits'][0][2].backward(retain_graph=True)
Unfortunately, layer.grad
is None
. I’ve tried almost everything I can think of, and now I’m a bit stuck. Thanks for the help!