In order to make generate text sequences with GPT-NEO
, I first load all the relevant components for sequence generation for GPTNeoForCausalLM
.
from transformers import AutoTokenizer, GPTNeoForCausalLM
import torch
from torch.nn import functional as F
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-125m")
model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-125m")
There are two ways how I can generate input_ids
and attention_mask
.
- I take the standard approach without padding
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
- I use padding instead
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'left'
tokenizer.truncation_side = 'left'
no_items_for_history = 30
inputs = tokenizer.encode_plus("Hello, my dog is cute", max_length=no_items_for_history, padding='max_length', truncation=True, return_tensors="pt")
Then for both approaches, I iteratively loop through everything in order generate the sequence on token at a time.
input_ids = inputs['input_ids']
attention_mask = inputs['attention_mask']
for i in range(10):
if i == 0:
outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=inputs["input_ids"])
else:
outputs = model(input_ids=new_input_ids, attention_mask=attention_mask, past_key_values=past_key_values)
loss = outputs.loss
logits = outputs.logits[:, -1, :]
logits = F.softmax(logits, dim=1)
topk_values, topk_indices = torch.topk(logits, 5)
inputs_in_topk = torch.multinomial(topk_values, num_samples=1, replacement=True)
new_input_ids = torch.gather(topk_indices, 1, inputs_in_topk)
past_key_values = outputs.past_key_values
attention_mask = torch.concat((attention_mask, torch.ones(1, 1).to(attention_mask.device)), dim=1)
input_ids = torch.concat((input_ids, new_input_ids), dim=1)
print(tokenizer.decode(input_ids.tolist()[0], skip_special_tokens=True))
Here is the problem:
The starting input_ids
and attention_mask
for the first approach look like:
input_ids = tensor([[15496, 11, 616, 3290, 318, 13779]])
attention_mask = tensor([[1, 1, 1, 1, 1, 1]])
The output looks very sensible:
Hello, my dog is cute! This post is about dogs and cats
However, for the second approach the starting input_ids
and attention_mask
look like
input_ids = tensor([[50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 15496, 11, 616, 3290, 318, 13779]])
attention_mask = tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]])
and it always generates nonsense like
Hello, my dog is cute pet is my pet pet pet is my dog is
Question: Do you know how to make it work with padding, i.e., the second approach?