# Forward Prop Forward prop is pretty straight forward. Important thing to keep in mind is the matrix sizes #### Matrix Dimensions Matrix dimension for the weights matrix will be as follows (size of current layer x size of previous layer) $w \in \boldsymbol{M}_{n^{l}\times n^{l-1}}(\mathbb{R})$ Matrix dimensions for bias matrix will be $b \in \boldsymbol{M}_{n^l\times 1}(\mathbb{R})$ Output of each layer will be a matrix of the size $n^l\times m$ where m is the number of samples ### Gradient Descent in Neural Network Gradient descent (backprop) is similar to how it was implemented in logistic regression using the chain rule in calculus. ### Initialization of Neural network This topic will be covered in more detail later. But just don't initialize weights with zeros. That will not work. Biases can be zeros.