# Neural Networks and Deep Learning ## Week 1 Week 1 introduces neural networks. Shows what is supervised learning. Tells the difference between structured data (in databases) and unstructured data (images, text, etc). And finally tells why neural networks are taking off now. Sigmoid function to ReLU function made neural networks faster. because gradient of sigmoid function is close to zero at high and low values ## Week 2 Week 2 uses logistic regression to illustrate the math behind neural networks. It introduces concepts such as computational graph, forward propagation, backward propagation, gradient descent, loss/cost function, etc. It also covers some python fundamentals such as vectorization and broadcasting. ### Logistic Regression Logistic regression is used for a binary classifier. For example, it is used to classify whether a 64 x 64 px image is a cat or not-cat. If the image is in color (rgb), the total number of features in the image is $64 * 64 * 3 = 12288$. This is denoted by the notation $n_x$. IThe number of images that could be trained on, is called the sample size and is denoted by notation $m$. The equation for logistic regression is as follows: $\widehat{y} = \sigma(w^Tx + b)$ where $\sigma$ is the sigmoid function that looks like this: <iframe src="https://www.desmos.com/calculator/m9ttsavltn?embed" width="500px" height="500px" style="border: 1px solid #ccc" frameborder=0></iframe> Sigmoid function is: $\sigma(z) = \frac{1}{1+e^{-z}}$ If z is large $\sigma(z)$ is 1 and if z is a large negative number $\sigma(z)$ is zero ### ![[Loss Functions]] ### Computational Graph The computational graph is just a sequence of mathematical operations written graphically. For example the following is a simple computational graph of the forward propagation step. ```mermaid graph LR; x((x)) --> z(("z=wx+b")) w((w)) --> z b((b)) --> z z --> a(("a = sigmoid(z)")) a --> L(("Loss(a,y)")) ``` ![[backward propagation]] ## Week 3 In week 3 Andrew Ng covers a "shallow" neural network. The shallow neural network described is shown below. ```mermaid graph LR; subgraph "Layer0 (input)" x1((x1)); x2((x2)); x3((x3)); end subgraph "Layer1 (hidden)" a1((a11)); a2((a12)); a3((a13)); a4((a14)); end subgraph "Layer2 (output)" a22((a22)); end x1 -->a1; x1 --> a2; x1 --> a3; x1 --> a4; x2 --> a1; x2 --> a2; x2 --> a3; x2 --> a4; x3 --> a1; x3 --> a2; x3 --> a3; x3 --> a4; a1 --> a22; a2 --> a22; a3 --> a22; a4 --> a22; a22 --> l((Loss)); ``` To count the number of layers in a neural network, do not count the input layer but include the output layer. 1 layer neural network is basically logistic regression. ### ![[Activation Functions]] ### ![[Forward Prop]] ## Week 4 Week 4 covers a "deep" neural network as opposed to a "shallow" one. He gives some intuition on why deep neural networks perform better than shallow ones. (because of non-linearity added by the activation functions). There is a summary of equations used(just an extension of the previous lectures). He introduces the concept of hyperparameters (learning rate, number of layers, number of hidden units, activation function, no. of epochs, etc.) --- Related: [[Deep Learning]] Date: December 29th 2020 0:53 AM