A forward pass through the model should be deterministic as far as I understand. The same input sequence should give the same logits used to get next-token prediction probabilities. The randomness I’m aware of in most LLMs is from how you decide to pick the next tokens (eg. greedy, top-k sampling, beam search, etc.) not from the forward pass through the network.
1 Like