# Softmax Activation Softmax activation is typically used as the last layer in a multi-label single output classification problem. It squashes a vector in the range (0, 1) and all the resulting elements add up to 1. The equation for softmax is written as $g(z) = \frac{e^{z^i}}{\sum_{j=1}^c e^{z^j}}$ Loss function for the softmax activation is the cross entrorpy loss $L(\hat{y},y) = -\sum_{j=1}^c y_j\log\hat{y}_j$ Softmax of size two is like logistic regression