# Loss Functions
## Cross Entropy
All cross entropy losses are variants of the loss written as follows
$J(w,b) = -\frac{1}{m}\sum_{i=1}^C{y_ilog(\hat{y_i})}$
where $C$ is the number of classes.
It may or may not be divided by the number of items to get a mean loss
### Binary Cross Entropy
This is used for a binary classification typically after a [[[Softmax]]] activation. The equation for binary cross entropy is:
$L(\hat{y}, y) = -(ylog(\hat{y}) + (1-y)log(1-\hat{y}))$
Cost function is just the mean of all losses for all the samples for a given value of $w$ and $b$
$J(w,b) = \frac{1}{m}\sum_{i=1}^m{L(\hat{y},y)}$
[[Pytorch]]: `torch.nn.BCELoss`
[[Keras]]: `tf.keras.losses.BinaryCrossentropy` or `binary_crossentropy`
### Categorical Cross Entropy / Negative Log Likelihood
This is the cross entropy you want to use after a [[Softmax]] layer for a multi-class (but single label) classifier.
In the output labels only one of them is 1. The rest should be all zeros. Hence the cross entropy reduces to
$J(w,b) = -log(\hat{y})$
or if you include the softmax activation inside the loss function:
$J(w,b) = -log(\frac{e^{z_p}}{\sum_{i=1}^C{e^{z_i}}})$
where $p$ is the positive class
Pytorch: `torch.nn.CrossEntropyLoss`
Keras: `tf.keras.losses.CategoricalCrossentropy` or `categorical_crossentropy`
### Multi-label Classification
Instead of a softmax layer you can use a [[Activation Functions#Sigmoid|Sigmoid]] activation at each output and pass it to an entropy loss. This is called `BCEWithLogitsLoss` in pytroch
$L(\hat{y}, y) = -\frac{1}{m}\sum_{i=1}^C y_ilog(\sigma(z_i)) + (1-y_i)log(1-\sigma(z_i)))$
Pytorch: `torch.nn.BCEWithLogitsLoss`
Keras: just do `binary_crossentropy` after `sigmoid` activation