T he softmax function is also called the normalized exponential function. It is a generalization of the logistic function that "squashes" a K -dimensional vector {\displaystyle \mathbf {z} } of arbitrary real values to a K -dimensional vector {\displaystyle \sigma (\mathbf {z} )} of real values in the range [0, 1] that add up to 1. In probability theory , t he output of the softmax function can be used to represent a categorical distribution – that is, a probability distribution over K different possible outcomes. Example: We know that every image in MNIST is of a handwritten digit between zero and nine. So there are only ten possible things that a given image can be. We want to be able to look at an image and give the probabilities for it being each digit. For example, our model might look at a picture of a nine and be 85% sure it's a nine, but give a 5% chan...
In my mind is a strange attractor!