Lesson 1
Important way to learn best:
Focus on what goes in, and what goes out.
Transformers
The size of the image you want should be 224 to ensure to get good results. You can try something different but the convention is 224. Squares makes it easier to compute (rectangle possible too),
Learners
There is one architecture that works really well almost all the time: resnet. You just have to choose the size.
It's an architecture with pretrained weights for a particular task. It was trained on all kinds of images of 1000 classes of different things. This way we start with a model that knows how to recognize 1000 different things already. This pretrained model knows a bit of what cats and dogs look like. = transfer learning
transfer learning
take a model that already knows things pretty well,. You can train models in 1 / 100 th less time and with less data.
set of images that the model doesn't look at.
Fitting a model - new approaches
We can use fit, but the better way is to use fit_one_cycle. It's more accurate. In 2018 it's the way to go for deep learning.
The metrics argument just prints out the metric. It is based on the validation set. DataBunch already creates a validation set for us. We need a validation set because if not, we don't know if we're over fitting.
fit_one_cycle() - faster and better than other approaches.
Save once it's fit
learn.save('name')
Results - what comes out?
With this object we can plot the top losses: will give you 4 things: the predicted cat, the actual cat, the loss and the probability of the actual class
Confusion matrix - to see where it go the preds wrong.
Alternative to confusion matrix is to use most_confused(). Which combo of predict and actual did the model get wront most often?
Now that we interpreted the results, how can we make the model better?
Unfreezing, fine-tunning, and learning rates
By default, when we have our ConvLearner and when we fit_one_cycle, it will just fine tune the last part of our model, not the convolutional neural network. That's why it trains very fast, it just trains the last layers. It'll never overfit.
However, to improve out model we'll want to train the whole model. This is why we unfreeze (learn.unfreeze()). Unfreeze = please train the whole model.
Then we can reuse : learn.fit_one_cycle(1)
Now, doing this will increase the error.
Intuition behind this: The convolutional layers have been trained to identify at first simple patterns and then more complex ones the more layers we go. Unfreezing means that we're going to start over to see if we can do better.
The layers of a NN represents different level of semantic complexity.
Layer 1 finds basically simple shapes, like a line or a color gradient, and the last layers combine previous layers and are able to identify more specific things (like faces of dog breeds). But since layer 1 is pretty much universal, a line is a line, the latter layers that can identify dog faces, are the ones we can change to get a better result.
When we unfreezed, we applied the same LR to all the layers. So it will try just as much to update the lines and gradients as the eyeballs and faces. We need to change that.
So let's change that in the code
learn.load('name') because we just broke the model let's get the old one.
Let's run learn.lr_find(). What is the fastest I can train this NN at, without making it fail too much,
Then we run learn.recorder.plot() to plot the result of our LR finder. This shows the learning and the resulting error.
The defaut LR is 0.03, and that corresponds on the graph to a high error. Because we're trying to fine tune things we can't use such a high learning rate.
Learning rate - how quickly am i updating the weights in my model.
So we see here we want a smaller LR.
Let's fit again with a smaller LR to start. But there's not point in learning all the later layers as much as the starting ones. We can pass a range of learning rates with slice. The range will say: start with a learning rate that is super small, and gradually increase the LR. We know from before that the latter layers were doing fine because we got a good result. Since we're using the loaded model that does already well on the last layers, you can increase the LR of the last layers.
Summary:
Use transfer learning and get a resnet model or other.
Train the last part of it.
Then unfreeze to train the whole thing, check which learning rates to use with the functions,
Create a range for your learning rates and fit the model. You'll get an even better result than if you had just trained the last part of the model.
Other ways to improve the model
Use a bigger model - Resnet50 for example.
Might get an out of ram error because the model won't find in our gpu ram. The fix is when we use ImageDataBunch.from_name_re(. ... bs = 64 or 32). There is a batch size parameter. It says, how many images do you train at one time.
Last updated