# Neural Nets

### Universal approximation

A neural network can approximate any other function, to an arbitrarily close accuracy, as long as it's big enough.&#x20;

### Code

```python
# reshape takes a tensor in and reshapes to whatever size you request. You only need to input 2 out 3 3, or n-1 out of n
# because the last one it will figure it out by itself
x_imgs = np.reshape(x_valid, (-1,28,28)); x_imgs.shape
```

### Normalization

In deeplearning we need to normalize because we're trying to train a parameterized model. If we don't it's going to be harder to get a network that train effectively.

We need to make sure anything we do on the training set we do the exact same thing in the validation and test set. So  for normalization we subtract the training set mean and divide by the training set std dev on the validation data (and not the validation mean and std dev).&#x20;

![](https://846345873-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LagOeJ2nL90MQERwhxy%2F-LrLhijAC7zz2J2FWPk8%2F-LrLt3qfTgSLsm17BuKp%2Fimage.png?alt=media\&token=0978a69c-76cf-4ac3-a2cc-f2415dc6e19b)

### Pytorch

Like numpy, but you can run in on a GPU.&#x20;

GPUs allows for matrix computations to be done at much great speeds.&#x20;

**Loss**

We need a loss function: lower the better we do.

How do we score how good we are? At the end we're gonna compute the derivative of the loss wrt the weight matrix, to figure out how to update it.&#x20;

Negative log likelihood loss - AKA cross entropy. There are two version: Binary cross entropy, or binary NLL or multiple NLL&#x20;

### softmax function

This will get called when the layer gets calculated. You pass it the data from the previous layer. Softmax is the activation function we use - It takes the outputs from our final layer, we do e^(each\_ouput) and divide that by the sum of the e^. Softmax gives us the probabilities. Because we're using e^(), we know that everyone of the probs is between 0 and 1. Also, big values in the input will turn out much bigger in the output (the final prob). 3.66 initially will turn out to be 0.86 probability.&#x20;

The softmax non linearity is something that returns things that behave like probs, and that 1 of the probs are more likely to be high, and the others to be low. This is why it's a great function to use to make it easy for the NN to map the outcome we wanted.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://julienbeaulieu.gitbook.io/wiki/sciences/machine-learning/courses/fast.ai-intro-to-ml/neural-nets.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
