# Activation Functions
Various activation functions and their derivatives were described. They include:
#### Sigmoid
##### Equation:
$\sigma(z) = g(z) = a = \frac{1}{1+e^{-z}}$
##### Derivative:
$g'(z) = a(1-a)$
Sigmoid looks like this:
<iframe src="https://www.desmos.com/calculator/j2tr9uogpu?embed" width="500" height="500" style="border: 1px solid #ccc" frameborder=0></iframe>
#### Tanh
Tanh almost always performs better than sigmoid as an activation layer for an hidden layer. Use sigmoid only if needed as the last layer in a binary classification problem
##### Equation:
$tanh(z) = g(z) = a = \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}$
##### Derivative:
$g'(z) = (1-a^2)$
Tanh looks like this
<iframe src="https://www.desmos.com/calculator/dhruayskkf?embed" width="500px" height="500px" style="border: 1px solid #ccc" frameborder=0></iframe>
#### ReLU
Rectified Linear Unit
##### Equation:
$relu(z) = g(z) = a = max(0,z)$
##### Derivative:
$g'(z) =
\begin{cases}
0 \text{ if } z < 0\\
1 \text{ if } z > 0
\end{cases}
$
If z is equal to 0 derivative is undefined. But its not a problem in most neural network implementations.
Relu looks like this
<iframe src="https://www.desmos.com/calculator/pyjy7lg76m?embed" width="500" height="500" style="border: 1px solid #ccc" frameborder=0></iframe>
#### Leaky ReLU
Rectified Linear Unit
##### Equation:
$relu(z) = g(z) = a = max(0.01*z,z)$
##### Derivative:
$g'(z) =
\begin{cases}
0.01 \text{ if } z < 0\\
1 \text{ if } z > 0
\end{cases}
$
Leaky relu looks like this
<iframe src="https://www.desmos.com/calculator/7x2wdw13pl?embed" width="500" height="500" style="border: 1px solid #ccc" frameborder=0></iframe>