# Neural Network Architecture

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LlHc-VJp0xdVwOheBCk%2F-LlIBp_3VTX7RqO48m42%2Fimage.png?alt=media\&token=057a7e49-dd0d-4e40-a651-9f6c6092d6a9)

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LlHc-VJp0xdVwOheBCk%2F-LlIBtaqOv5jl4Uf4jdw%2Fimage.png?alt=media\&token=01da889a-a313-4cb2-b0f8-19e857df6aad)

You take the prob of both points in the 2 models, add them and apply the sigmoid function to get the new prob.&#x20;

What if we want to weigh the sum?&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LlHc-VJp0xdVwOheBCk%2F-LlICKb0JZLQjyCs0XDq%2Fimage.png?alt=media\&token=dd136ff5-ded5-4a8e-8cb0-d1e03d6c8508)

We have a combination of the 2 previous model + the weights + the bias.&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LlHc-VJp0xdVwOheBCk%2F-LlICkUPnBHd0ZzjiBCe%2Fimage.png?alt=media\&token=76aa7591-007f-47dd-bc12-f4bb75154bad)

### Feedforward

Simple version

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LlHc-VJp0xdVwOheBCk%2F-LlIJ-L6R1xWKrHNEc_0%2Fimage.png?alt=media\&token=e22872df-0a94-44af-8448-d3f03c072076)

The perceptron will output a probability. In this case it will be small because the point is incorrectly classified. This process is known as feedforward.

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LlHc-VJp0xdVwOheBCk%2F-LlIKFygW78uU1Ja5JKL%2Fimage.png?alt=media\&token=2c0b5ae2-22ad-4197-8113-f807ebd76ac1)

y has is the prob that the point is labeled blue. This is what neural networks do: they take the input vector and then apply a sequence of lin models and simoid functions. The maps when combined become highly non linear.&#x20;

Our prediction is therefor y\_hat = .. see formula above. Multiplications of matrices and sigmoid functions. What about the error?&#x20;

Recall that the error for our perceptron was:&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LlHc-VJp0xdVwOheBCk%2F-LlIL5K43VqfG8jkrjHE%2Fimage.png?alt=media\&token=b8df1fd3-631f-4464-9d32-5d48f6397438)

We can actually use the same error function for multilayer perceptron. It's just that y\_hat will be a little more complicated.&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LlHc-VJp0xdVwOheBCk%2F-LlILGJt40I3m6a0Lmnr%2Fimage.png?alt=media\&token=7f6a9c0d-f78a-44b0-83b4-9a6968b24daf)

### Backpropagation

In a nutshell, backpropagation will consist of:

* Doing a feedforward operation.
* Comparing the output of the model with the desired output.
* Calculating the error.
* Running the feedforward operation backwards (backpropagation) to spread the error to each of the weights.
* Use this to update the weights, and get a better model.
* Continue this until we have a model that is good.

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LlHc-VJp0xdVwOheBCk%2F-LlITnu2OhccgFVAFdiC%2Fimage.png?alt=media\&token=d4628613-6860-4770-91a1-acf0a1bacd47)

The gradient descent: you take all the W\_i\_j super k, and we update it by adding a small number : the learning rate \* partial derivative of E with respect to that same weight. It'll give us a new updated weight at W'\_i\_j super k.

Feedforward again:

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LlHc-VJp0xdVwOheBCk%2F-LlIbn9JcasRkGZPKNaY%2Fimage.png?alt=media\&token=3be18ec1-84f1-4517-b70e-5c08c25a4cc3)

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LlHc-VJp0xdVwOheBCk%2F-LlIc6R09hwN-de3x6Ws%2Fimage.png?alt=media\&token=fc164dd7-f124-4551-9e1f-8d50c7837a21)

&#x20;
