Loss Functions - Vivek's Digital Garden

# Loss Functions ## Cross Entropy All cross entropy losses are variants of the loss written as follows $J(w,b) = -\frac{1}{m}\sum_{i=1}^C{y_ilog(\hat{y_i})}$ where $C$ is the number of classes. It may or may not be divided by the number of items to get a mean loss ### Binary Cross Entropy This is used for a binary classification typically after a [[[Softmax]]] activation. The equation for binary cross entropy is: $L(\hat{y}, y) = -(ylog(\hat{y}) + (1-y)log(1-\hat{y}))$ Cost function is just the mean of all losses for all the samples for a given value of $w$ and $b$ $J(w,b) = \frac{1}{m}\sum_{i=1}^m{L(\hat{y},y)}$ [[Pytorch]]: `torch.nn.BCELoss` [[Keras]]: `tf.keras.losses.BinaryCrossentropy` or `binary_crossentropy` ### Categorical Cross Entropy / Negative Log Likelihood This is the cross entropy you want to use after a [[Softmax]] layer for a multi-class (but single label) classifier. In the output labels only one of them is 1. The rest should be all zeros. Hence the cross entropy reduces to $J(w,b) = -log(\hat{y})$ or if you include the softmax activation inside the loss function: $J(w,b) = -log(\frac{e^{z_p}}{\sum_{i=1}^C{e^{z_i}}})$ where $p$ is the positive class Pytorch: `torch.nn.CrossEntropyLoss` Keras: `tf.keras.losses.CategoricalCrossentropy` or `categorical_crossentropy` ### Multi-label Classification Instead of a softmax layer you can use a [[Activation Functions#Sigmoid|Sigmoid]] activation at each output and pass it to an entropy loss. This is called `BCEWithLogitsLoss` in pytroch $L(\hat{y}, y) = -\frac{1}{m}\sum_{i=1}^C y_ilog(\sigma(z_i)) + (1-y_i)log(1-\sigma(z_i)))$ Pytorch: `torch.nn.BCEWithLogitsLoss` Keras: just do `binary_crossentropy` after `sigmoid` activation