# Activation Functions Various activation functions and their derivatives were described. They include: #### Sigmoid ##### Equation: $\sigma(z) = g(z) = a = \frac{1}{1+e^{-z}}$ ##### Derivative: $g'(z) = a(1-a)$ Sigmoid looks like this: <iframe src="https://www.desmos.com/calculator/j2tr9uogpu?embed" width="500" height="500" style="border: 1px solid #ccc" frameborder=0></iframe> #### Tanh Tanh almost always performs better than sigmoid as an activation layer for an hidden layer. Use sigmoid only if needed as the last layer in a binary classification problem ##### Equation: $tanh(z) = g(z) = a = \frac{e^{z}-e^{-z}}{e^{z}+e^{-z}}$ ##### Derivative: $g'(z) = (1-a^2)$ Tanh looks like this <iframe src="https://www.desmos.com/calculator/dhruayskkf?embed" width="500px" height="500px" style="border: 1px solid #ccc" frameborder=0></iframe> #### ReLU Rectified Linear Unit ##### Equation: $relu(z) = g(z) = a = max(0,z)$ ##### Derivative: $g'(z) = \begin{cases} 0 \text{ if } z < 0\\ 1 \text{ if } z > 0 \end{cases} $ If z is equal to 0 derivative is undefined. But its not a problem in most neural network implementations. Relu looks like this <iframe src="https://www.desmos.com/calculator/pyjy7lg76m?embed" width="500" height="500" style="border: 1px solid #ccc" frameborder=0></iframe> #### Leaky ReLU Rectified Linear Unit ##### Equation: $relu(z) = g(z) = a = max(0.01*z,z)$ ##### Derivative: $g'(z) = \begin{cases} 0.01 \text{ if } z < 0\\ 1 \text{ if } z > 0 \end{cases} $ Leaky relu looks like this <iframe src="https://www.desmos.com/calculator/7x2wdw13pl?embed" width="500" height="500" style="border: 1px solid #ccc" frameborder=0></iframe>