Feature extraction pipeline Vs model hidden states

Hi,

This could be a very naive question but I’m not able to understand what features are extracted by the “feature-extraction” pipeline. I tried the following so far

text = 'I will learn the embeddings for this sentence now'
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')
feature_extractor = pipeline ('feature-extraction', model='bert-base-multilingual-uncased', tokenizer=tokenizer)
try:
  features = torch.tensor (feature_extractor (text))
  print (features)
except RuntimeError:
  print ("Error")

which gives me the following output

tensor([[[ 0.0069,  0.0085,  0.0350,  ..., -0.0127,  0.0450, -0.0289],
         [ 0.1185,  0.3802, -0.0386,  ..., -0.2473,  0.4393, -0.5417],
         [-0.1408, -0.2094, -0.1027,  ..., -0.0744,  0.3208, -1.0260],
         ...,
         [-0.0517,  0.0047, -0.1229,  ..., -0.0555,  0.4420, -0.2788],
         [-0.1698,  0.2366, -0.3831,  ..., -0.0218,  0.3211, -0.3036],
         [-0.4897,  0.3905, -0.1925,  ..., -0.0605,  0.2510, -0.8872]]])

However, I then tried to extract the hidden states from the model with the following code:

class BertFeatureExtractor (object):
  def __init__ (self, model_name):
    self.tokenizer = BertTokenizer.from_pretrained (model_name)
    self.model = BertModel.from_pretrained (model_name)
    self.model.eval()
  
  def extract (self, text):
    try:
      encoded_input  = self.tokenizer(text, return_tensors='pt')
      output = self.model (**encoded_input, output_hidden_states=True)
    except RuntimeError:
      output = None
      print (f'Model cannot learn embeddings for {text}')
    return encoded_input, output

I then get the embeddings as:

feat_extractor = BertFeatureExtractor ('bert-base-multilingual-uncased')
with torch.no_grad ():
  encoded_input, output = feat_extractor.extract (text)

None of the output['hidden_states'] or output['last_hidden_state'] match the output of the feature-extraction pipeline. Is that expected? Are the features calculated by taking some combination of the layers? If so, how? Or the feature extraction is from a different way altogether?

I realized I had made a mistake in the type of tokenizer that I was using in the different ways of getting the embeddings. The “feature-extraction” gives last hidden state

1 Like