I am curious to know how I would do this using GPT-2. Thank you for your time!
Hi there, here is a quick way to do this for the last token on a given sentence in PyTorch:
from transformers import GPT2LMHeadModel, GPT2Tokenizer import torch import torch.nn.functional as F # Load model and tokenizer model = GPT2LMHeadModel.from_pretrained('gpt2') tokenizer = GPT2Tokenizer.from_pretrained('gpt2') # Input example input_txt = "Hello, my name is Sylvain." inputs = tokenizer(input_txt, return_tensors='pt') outputs = model(**inputs) # If you are not on a source install, replace outputs.logits by outputs predictions = F.softmax(outputs.logits, dim=-1) thresh = 1e-2 vocab_size = predictions.shape[-1] # Predictions has one sentence (index 0) and we look at the last token predicted (-1) idxs = torch.arange(0, vocab_size)[predictions[-1] >= thresh] print(tokenizer.convert_ids_to_tokens(idxs))
I can’t thank you enough for your detailed response! I apologize if I am asking too much of this forum, but given I have this question I am sure others would benefit from an answer as well.
While on this topic, I wonder what steps would need to be taken to expand this function for the output to include phrases in addition to words.
I don’t think the generate method can return the probabilities, so you might have to tweak the generate function to return them.