Neural Nets
Last updated
Last updated
A neural network can approximate any other function, to an arbitrarily close accuracy, as long as it's big enough.
In deeplearning we need to normalize because we're trying to train a parameterized model. If we don't it's going to be harder to get a network that train effectively.
We need to make sure anything we do on the training set we do the exact same thing in the validation and test set. So for normalization we subtract the training set mean and divide by the training set std dev on the validation data (and not the validation mean and std dev).
Like numpy, but you can run in on a GPU.
GPUs allows for matrix computations to be done at much great speeds.
Loss
We need a loss function: lower the better we do.
How do we score how good we are? At the end we're gonna compute the derivative of the loss wrt the weight matrix, to figure out how to update it.
Negative log likelihood loss - AKA cross entropy. There are two version: Binary cross entropy, or binary NLL or multiple NLL
This will get called when the layer gets calculated. You pass it the data from the previous layer. Softmax is the activation function we use - It takes the outputs from our final layer, we do e^(each_ouput) and divide that by the sum of the e^. Softmax gives us the probabilities. Because we're using e^(), we know that everyone of the probs is between 0 and 1. Also, big values in the input will turn out much bigger in the output (the final prob). 3.66 initially will turn out to be 0.86 probability.
The softmax non linearity is something that returns things that behave like probs, and that 1 of the probs are more likely to be high, and the others to be low. This is why it's a great function to use to make it easy for the NN to map the outcome we wanted.