JulienBeaulieu
  • Introduction
  • Sciences
    • Math
      • Probability
        • Bayes Rule
        • Binomial distribution
        • Conditional Probability
      • Statistics
        • Descriptive Statistics
        • Inferential Statistics
          • Normal Distributions
          • Sampling Distributions
          • Confidence Intervals
          • Hypothesis Testing
          • AB Testing
        • Simple Linear Regression
        • Multiple Linear Regression
          • Statistical learning course
          • Model Assumptions And How To Address Each
        • Logistic Regression
      • Calculus
        • The big picture of Calculus
          • Derivatives
          • 2nd derivatives
          • The exponential e^x
        • Calculus
        • Gradient
      • Linear Algebra
        • Matrices
          • Matrix Multiplication
          • Inverses and Transpose and permutations
        • Vector Space and subspaces
        • Orthogonality
          • Orthogonal Sets
          • Projections
          • Least Squares
        • Gaussian Elimination
    • Programming
      • Command Line
      • Git & GitHub
      • Latex
      • Linear Algebra
        • Element-wise operations, Multiplication Transpose
      • Encodings and Character Sets
      • Uncategorized
      • Navigating Your Working Directory and File I/O
      • Python
        • Problem Solving
        • Strings
        • Lists & Dictionaries
        • Storing Data
        • HTTP Requests
      • SQL
        • Basic Statements
        • Entity Relationship Diagram
      • Jupyter Notebooks
      • Data Analysis
        • Data Visualization
          • Data Viz Cheat Sheet
          • Explanatory Analysis
          • Univariate Exploration of Data
            • Bar Chart
            • Pie Charts
            • Histograms
            • Kernel Density Estimation
            • Figures, Axes, and Subplots
            • Choosing a Plot for Discrete Data
            • Scales and Transformations (Log)
          • Bivariate Exploration of Data
            • Scatterplots
            • Overplotting, Transparency, and Jitter
            • Heatmaps
            • Violin & Box Plots
            • Categorical Variable Analysis
            • Faceting
            • Line Plots
            • Adapted Bar Charts
            • Q-Q, Swarm, Rug, Strip, Stacked, and Rigeline Plots
          • Multivariate Exploration of Data
            • Non-Positional Encodings for Third Variables
            • Color Palettes
            • Faceting for Multivariate Data
            • Plot and Correlation Matrices
            • Other Adaptations of Bivariate PLots
            • Feature Engineering for Data Viz
        • Python - Cheat Sheet
    • Machine Learning
      • Courses
        • Practical Deep learning for coders
          • Convolutional Neural Networks
            • Image Restauration
            • U-net
          • Lesson 1
          • Lesson 2
          • Lesson 3
          • Lesson 4 NLP, Collaborative filtering, Embeddings
          • Lesson 5 - Backprop, Accelerated SGD
          • Tabular data
        • Fast.ai - Intro to ML
          • Neural Nets
          • Business Applications
          • Class 1 & 2 - Random Forests
          • Lessons 3 & 4
      • Unsupervised Learning
        • Dimensionality Reduction
          • Independant Component Analysis
          • Random Projection
          • Principal Component Analysis
        • K-Means
        • Hierarchical Clustering
        • DBSCAN
        • Gaussian Mixture Model Clustering
        • Cluster Validation
      • Preprocessing
      • Machine Learning Overview
        • Confusion Matrix
      • Linear Regression
        • Feature Scaling and Normalization
        • Regularization
        • Polynomial Regression
        • Error functions
      • Decision Trees
      • Support Vector Machines
      • Training and Tuning
      • Model Evaluation Metrics
      • NLP
      • Neural Networks
        • Perceptron Algorithm
        • Multilayer Perceptron
        • Neural Network Architecture
        • Gradient Descent
        • Backpropagation
        • Training Neural Networks
  • Business
    • Analytics
      • KPIs for a Website
  • Books
    • Statistics
      • Practice Statistics for Data Science
        • Exploring Binary and Categorical Data
        • Data and Sampling Distributions
        • Statistical Experiments and Significance Testing
        • Regression and Prediction
        • Classification
        • Correlation
    • Pragmatic Thinking and Learning
      • Untitled
    • A Mind For Numbers: How to Excel at Math and Science
      • Focused and diffuse mode
      • Procrastination
      • Working memory and long term memory
        • Chunking
      • Importance of sleeping
      • Q&A with Terrence Sejnowski
      • Illusions of competence
      • Seeing the bigger picture
        • The value of a Library of Chunks
        • Overlearning
Powered by GitBook
On this page
  • Fitting a model - new approaches
  • Results - what comes out?
  • Unfreezing, fine-tunning, and learning rates
  • Other ways to improve the model

Was this helpful?

  1. Sciences
  2. Machine Learning
  3. Courses
  4. Practical Deep learning for coders

Lesson 1

Important way to learn best:

Focus on what goes in, and what goes out.

Transformers

The size of the image you want should be 224 to ensure to get good results. You can try something different but the convention is 224. Squares makes it easier to compute (rectangle possible too),

Learners

There is one architecture that works really well almost all the time: resnet. You just have to choose the size.

It's an architecture with pretrained weights for a particular task. It was trained on all kinds of images of 1000 classes of different things. This way we start with a model that knows how to recognize 1000 different things already. This pretrained model knows a bit of what cats and dogs look like. = transfer learning

transfer learning

take a model that already knows things pretty well,. You can train models in 1 / 100 th less time and with less data.

set of images that the model doesn't look at.

Fitting a model - new approaches

We can use fit, but the better way is to use fit_one_cycle. It's more accurate. In 2018 it's the way to go for deep learning.

tfms = get_transforms()
data = ImageDataBunch.from_folder(Path('flowers'), ds_tfms=tfms, size=224, bs=bs, test = 'test'
                                  ).normalize()

learn = ConvLearner(data, models.resnert34, metrics=error_rate)
learn.fit_one_cycle(4) # 4 is a good start

The metrics argument just prints out the metric. It is based on the validation set. DataBunch already creates a validation set for us. We need a validation set because if not, we don't know if we're over fitting.

fit_one_cycle() - faster and better than other approaches.

Save once it's fit

learn.save('name')

Results - what comes out?

interp = ClassificationInterpretation.from_learner(learn)

With this object we can plot the top losses: will give you 4 things: the predicted cat, the actual cat, the loss and the probability of the actual class

interp.plot_top_losses(9, figsize=(15,11))

Confusion matrix - to see where it go the preds wrong.

interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

Alternative to confusion matrix is to use most_confused(). Which combo of predict and actual did the model get wront most often?

interp.most_confused(min_val=2)

Now that we interpreted the results, how can we make the model better?

Unfreezing, fine-tunning, and learning rates

By default, when we have our ConvLearner and when we fit_one_cycle, it will just fine tune the last part of our model, not the convolutional neural network. That's why it trains very fast, it just trains the last layers. It'll never overfit.

However, to improve out model we'll want to train the whole model. This is why we unfreeze (learn.unfreeze()). Unfreeze = please train the whole model.

Then we can reuse : learn.fit_one_cycle(1)

Now, doing this will increase the error.

Intuition behind this: The convolutional layers have been trained to identify at first simple patterns and then more complex ones the more layers we go. Unfreezing means that we're going to start over to see if we can do better.

The layers of a NN represents different level of semantic complexity.

Layer 1 finds basically simple shapes, like a line or a color gradient, and the last layers combine previous layers and are able to identify more specific things (like faces of dog breeds). But since layer 1 is pretty much universal, a line is a line, the latter layers that can identify dog faces, are the ones we can change to get a better result.

When we unfreezed, we applied the same LR to all the layers. So it will try just as much to update the lines and gradients as the eyeballs and faces. We need to change that.

So let's change that in the code

learn.load('name') because we just broke the model let's get the old one.

Let's run learn.lr_find(). What is the fastest I can train this NN at, without making it fail too much,

Then we run learn.recorder.plot() to plot the result of our LR finder. This shows the learning and the resulting error.

The defaut LR is 0.03, and that corresponds on the graph to a high error. Because we're trying to fine tune things we can't use such a high learning rate.

Learning rate - how quickly am i updating the weights in my model.

So we see here we want a smaller LR.

Let's fit again with a smaller LR to start. But there's not point in learning all the later layers as much as the starting ones. We can pass a range of learning rates with slice. The range will say: start with a learning rate that is super small, and gradually increase the LR. We know from before that the latter layers were doing fine because we got a good result. Since we're using the loaded model that does already well on the last layers, you can increase the LR of the last layers.

Summary:

Use transfer learning and get a resnet model or other.

Train the last part of it.

Then unfreeze to train the whole thing, check which learning rates to use with the functions,

Create a range for your learning rates and fit the model. You'll get an even better result than if you had just trained the last part of the model.

Other ways to improve the model

Use a bigger model - Resnet50 for example.

Might get an out of ram error because the model won't find in our gpu ram. The fix is when we use ImageDataBunch.from_name_re(. ... bs = 64 or 32). There is a batch size parameter. It says, how many images do you train at one time.

PreviousU-netNextLesson 2

Last updated 5 years ago

Was this helpful?