When doing model.predict() on huggingface Llama model, I get loss of 1.16, which is greater than 1. If the language model loss is cross entropy, how can it be greater than 1?
When doing model.predict() on huggingface Llama model, I get loss of 1.16, which is greater than 1. If the language model loss is cross entropy, how can it be greater than 1?