Logits are the raw output scores from a modelβs final layer β before applying any transformation. They represent unnormalized confidence levels for each option or class.
The softmax function transforms logits into probabilities that sum to 1. It allows the model to express confidence for each class in a probabilistic way.
Each value is exponentiated, then divided by the sum of all exponentiated values β making the outputs valid probabilities.
Normalization ensures that outputs are meaningful probabilities, helping models make interpretable and fair decisions.
To avoid overflow/underflow from large exponentials, we subtract the maximum logit before computing softmax. This doesn't change the result, but it prevents errors.
Tip: Always compute softmax as exp(logits - max(logits))
Softmax is a function that converts a list of numbers (logits) into a probability distribution. Itβs widely used in classification problems and attention mechanisms in neural networks.
Example: For inputs [2.0, 1.0, 0.1], softmax outputs roughly:
[0.65, 0.24, 0.11]
These probabilities sum to 1.
You can control the sharpness of softmax with a temperature parameter (T):
softmax(xα΅’) = e^(xα΅’ / T) / β e^(xβ±Ό / T)
In frameworks like PyTorch:
loss = nn.CrossEntropyLoss()(logits, targets) # DO NOT apply softmax manually
Softmax is already applied internally in CrossEntropyLoss for numerical stability. Applying it manually will lead to incorrect training.