Get score for specific token in vocab set when generating withBLIP

Hi,

I am using a finetuned BLIP to do image captioning. I want to lookup a specific token’s score in the vocab set. I know BLIP generates captions by taking the highest score. I want to find the score for a specific token that may or may not be generated in the caption. BLIP uses the BertTokenizer.

for idx, batch in enumerate(val_dataloader):
filenames = batch.pop(‘filename’)
pixel_values = batch[‘pixel_values’].to(device)
generated_ids = model.generate(pixel_values=pixel_values, max_length=500, output_scores=True, return_dict_in_generate=True)

for fn in filenames:  
  # iterate over each position's vocab set
  for vocab in generated_ids.scores:
      print("vocab: ", vocab)

Here vocab is a tensor with scores for each token in the vocab set. However, there is no corresponding token id for me to be able to look up a specific token’s score.

vocab
tensor([[ 1.3996, -1.0718, -1.0718, …, -1.0718, -1.0719, -1.0719]],
device=‘cuda:0’)

Any help would be appreciated!

Thanks,
Nina