DistillGpt2 only predicts endoftext if context is full

I’m getting single token predictions from DistillGpt2 using a full input buffer. It always returns endoftext. If I only fill the buffer halfway I get a variety of predictions. I’m guessing this is a quirk of this model. Can anyone tell me if I’m doing something wrong?

dataset = TokenDataset(tokens_csv)
print(f"Loaded token index from {tokens_csv}")

model = GPT2LMHeadModel.from_pretrained('distilgpt2')
tokenizer = GPT2Tokenizer.from_pretrained('distilgpt2')
print(f"Loaded distilgpt2")

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
print(f"Running on {device}")

max_length = model.config.n_ctx
print(f"Context length {max_length}")

for i in range(10):
    sample_tokens = dataset.get_random_sample(max_length)
    input_ids = torch.tensor(sample_tokens).unsqueeze(0).to(device)

    with torch.no_grad():
        outputs = model(input_ids, labels=input_ids)
        loss, logits = outputs[:2]

    predicted_token_id = torch.argmax(logits[0, -1, :]).item()
    predicted_token = tokenizer.decode([predicted_token_id])

    print(f"Predicted next token: {predicted_token}")

The output for context of length max_length is

Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>
Predicted next token: <|endoftext|>

If I halve max_length, I get this:

Predicted next token: ��
Predicted next token: risome
Predicted next token: -
Predicted next token: ��
Predicted next token:  season
Predicted next token:  Rutgers
Predicted next token:  the
Predicted next token:  important
Predicted next token: ��
Predicted next token:  the