Softmax vs logits

Why do we need to apply softmax after getting the logit values? I know it says that it would help to normalise the scores and get a probabilistic interpretation. But is it not that that the utility of logits/softmax scores is to determine which value is bigger and then infer the label.
For example, if you get a logits score of [-4.2095, 4.6053], where -4.2095 refers to label0 and 4.6053 refers to label1. Then as 4.6053 > -4.2095, I would keep label1 as my prediction. If instead I apply softmax to the logits score, I get [1.4850e-04, 9.9985e-01]. With this softmax score I will still infer that the predicted label is label1.

1 Like

If you just want to get the predicted class, you don’t need the softmax layer as, as you pointed out, you just have to take the index of the maximum logits. The softmax will convert the logits into probabilities, so you should use it when you want the probabilities for each prediction.

1 Like