backward propagation - Vivek's Digital Garden

# Backward Propagation Backward propagation is the process of taking derivatives at each step of the cost function. Since the chain rule of derivatives can be applied as shown below, the derivatives at each steps are multiplied together to identify the value $dw$ for the next step. $\frac{dL}{dw} = \frac{dL}{da}\frac{da}{dz}\frac{dz}{dw}$ For logistic regression the backprop was derived to $\frac{dL}{dw} = (a - y) x$ The term $\frac{dL}{dw}$ is written as simply `dw` or easy notation when writing code. ## Gradient Descent Back prop is implemented with gradient descent in neural networks. Gradient descent is an iterative method for finding the local minima of a function using the opposite direction of the gradient of the function at the current point (because this is the direction of the steepest descent). It is multiplied by a learning rate $\alpha$, so that it doesn't diverge $w := w-\alpha{\frac{\partial J(w,b)}{\partial w}}$ $b := b-\alpha{\frac{\partial J(w,b)}{\partial b}}$ In code ```w = w - learning_rate * dw``` The term $\frac{\partial J}{\partial w}$ is written as `dw` for easy notation when writing code. --- Related: [[Deep Learning]]