I have a model that I have fine-tuned and saved. Now, I would like to use those saved weights to yield the hidden_state embeddings, in an attempt to see how they work in other models like a CNN, but I am unable to.
Here is my process
# load pretrained distilbert
model = DistilBertForSequenceClassification.from_pretrained('C:\\Users\\14348\\Desktop\\pretrained_bert',
output_hidden_states=True)
tokenizer = DistilBertTokenizer.from_pretrained('C:\\Users\\14348\\Desktop\\pretrained_bert')
I tokenized my text in the exact same way that I did to fine-tune the saved model above. One example looks like:
b_input_ids[0]
tensor([ 101, 1000, 22190, 10711, 1024, 2093, 3548, 2730, 1999, 8479,
1999, 1062, 2953, 18900, 1000, 2405, 2011, 16597, 2376, 1997,
24815, 4037, 2006, 1020, 2285, 2429, 2000, 1037, 3189, 2013,
22190, 10711, 2874, 1010, 1037, 3067, 7738, 2001, 3344, 2041,
2006, 3329, 3548, 1997, 1996, 4099, 1010, 2040, 2020, 4458,
2083, 2019, 2181, 2006, 1996, 14535, 2480, 1011, 1062, 2953,
18900, 2364, 2346, 1999, 1062, 2953, 18900, 2212, 1997, 2023,
2874, 2012, 2105, 6694, 2023, 2851, 1012, 1996, 3189, 9909,
2008, 2093, 3548, 2020, 2730, 2006, 1996, 3962, 2004, 1037,
2765, 1997, 1996, 8479, 1012, 102, 0, ... ])
Now when I go to grab the embeddings like so:
# ignore compute graph
with torch.no_grad():
logits, hidden_states = model(input_ids=b_input_ids,
attention_mask=b_masks)
I get the following error:
IndexError: index out of range in self
But if I generate a new sentence on the fly, I can get embeddings for it no problem:
input_sentence = torch.tensor(tokenizer.encode("My sentence")).unsqueeze(0)
# ignore compute graph
with torch.no_grad():
logits, hidden_states = model(input_ids=input_sentence)
len(hidden_states)
8
logits
tensor([[ 0.2188, -0.0540]])
Thanks for your time and help!