I’m getting single token predictions from DistillGpt2 using a full input buffer. It always returns endoftext. If I only fill the buffer halfway I get a variety of predictions. I’m guessing this is a quirk of this model. Can anyone tell me if I’m doing something wrong?
dataset = TokenDataset(tokens_csv)
print(f"Loaded token index from {tokens_csv}")
model = GPT2LMHeadModel.from_pretrained('distilgpt2')
tokenizer = GPT2Tokenizer.from_pretrained('distilgpt2')
print(f"Loaded distilgpt2")
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
print(f"Running on {device}")
max_length = model.config.n_ctx
print(f"Context length {max_length}")
for i in range(10):
sample_tokens = dataset.get_random_sample(max_length)
input_ids = torch.tensor(sample_tokens).unsqueeze(0).to(device)
with torch.no_grad():
outputs = model(input_ids, labels=input_ids)
loss, logits = outputs[:2]
predicted_token_id = torch.argmax(logits[0, -1, :]).item()
predicted_token = tokenizer.decode([predicted_token_id])
print(f"Predicted next token: {predicted_token}")
The output for context of length max_length is
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
If I halve max_length, I get this:
Predicted next token: ��
Predicted next token: risome
Predicted next token: -
Predicted next token: ��
Predicted next token: season
Predicted next token: Rutgers
Predicted next token: the
Predicted next token: important
Predicted next token: ��
Predicted next token: the