# Softmax Activation
Softmax activation is typically used as the last layer in a multi-label single output classification problem. It squashes a vector in the range (0, 1) and all the resulting elements add up to 1.
The equation for softmax is written as
$g(z) = \frac{e^{z^i}}{\sum_{j=1}^c e^{z^j}}$
Loss function for the softmax activation is the cross entrorpy loss
$L(\hat{y},y) = -\sum_{j=1}^c y_j\log\hat{y}_j$
Softmax of size two is like logistic regression