JulienBeaulieu
  • Introduction
  • Sciences
    • Math
      • Probability
        • Bayes Rule
        • Binomial distribution
        • Conditional Probability
      • Statistics
        • Descriptive Statistics
        • Inferential Statistics
          • Normal Distributions
          • Sampling Distributions
          • Confidence Intervals
          • Hypothesis Testing
          • AB Testing
        • Simple Linear Regression
        • Multiple Linear Regression
          • Statistical learning course
          • Model Assumptions And How To Address Each
        • Logistic Regression
      • Calculus
        • The big picture of Calculus
          • Derivatives
          • 2nd derivatives
          • The exponential e^x
        • Calculus
        • Gradient
      • Linear Algebra
        • Matrices
          • Matrix Multiplication
          • Inverses and Transpose and permutations
        • Vector Space and subspaces
        • Orthogonality
          • Orthogonal Sets
          • Projections
          • Least Squares
        • Gaussian Elimination
    • Programming
      • Command Line
      • Git & GitHub
      • Latex
      • Linear Algebra
        • Element-wise operations, Multiplication Transpose
      • Encodings and Character Sets
      • Uncategorized
      • Navigating Your Working Directory and File I/O
      • Python
        • Problem Solving
        • Strings
        • Lists & Dictionaries
        • Storing Data
        • HTTP Requests
      • SQL
        • Basic Statements
        • Entity Relationship Diagram
      • Jupyter Notebooks
      • Data Analysis
        • Data Visualization
          • Data Viz Cheat Sheet
          • Explanatory Analysis
          • Univariate Exploration of Data
            • Bar Chart
            • Pie Charts
            • Histograms
            • Kernel Density Estimation
            • Figures, Axes, and Subplots
            • Choosing a Plot for Discrete Data
            • Scales and Transformations (Log)
          • Bivariate Exploration of Data
            • Scatterplots
            • Overplotting, Transparency, and Jitter
            • Heatmaps
            • Violin & Box Plots
            • Categorical Variable Analysis
            • Faceting
            • Line Plots
            • Adapted Bar Charts
            • Q-Q, Swarm, Rug, Strip, Stacked, and Rigeline Plots
          • Multivariate Exploration of Data
            • Non-Positional Encodings for Third Variables
            • Color Palettes
            • Faceting for Multivariate Data
            • Plot and Correlation Matrices
            • Other Adaptations of Bivariate PLots
            • Feature Engineering for Data Viz
        • Python - Cheat Sheet
    • Machine Learning
      • Courses
        • Practical Deep learning for coders
          • Convolutional Neural Networks
            • Image Restauration
            • U-net
          • Lesson 1
          • Lesson 2
          • Lesson 3
          • Lesson 4 NLP, Collaborative filtering, Embeddings
          • Lesson 5 - Backprop, Accelerated SGD
          • Tabular data
        • Fast.ai - Intro to ML
          • Neural Nets
          • Business Applications
          • Class 1 & 2 - Random Forests
          • Lessons 3 & 4
      • Unsupervised Learning
        • Dimensionality Reduction
          • Independant Component Analysis
          • Random Projection
          • Principal Component Analysis
        • K-Means
        • Hierarchical Clustering
        • DBSCAN
        • Gaussian Mixture Model Clustering
        • Cluster Validation
      • Preprocessing
      • Machine Learning Overview
        • Confusion Matrix
      • Linear Regression
        • Feature Scaling and Normalization
        • Regularization
        • Polynomial Regression
        • Error functions
      • Decision Trees
      • Support Vector Machines
      • Training and Tuning
      • Model Evaluation Metrics
      • NLP
      • Neural Networks
        • Perceptron Algorithm
        • Multilayer Perceptron
        • Neural Network Architecture
        • Gradient Descent
        • Backpropagation
        • Training Neural Networks
  • Business
    • Analytics
      • KPIs for a Website
  • Books
    • Statistics
      • Practice Statistics for Data Science
        • Exploring Binary and Categorical Data
        • Data and Sampling Distributions
        • Statistical Experiments and Significance Testing
        • Regression and Prediction
        • Classification
        • Correlation
    • Pragmatic Thinking and Learning
      • Untitled
    • A Mind For Numbers: How to Excel at Math and Science
      • Focused and diffuse mode
      • Procrastination
      • Working memory and long term memory
        • Chunking
      • Importance of sleeping
      • Q&A with Terrence Sejnowski
      • Illusions of competence
      • Seeing the bigger picture
        • The value of a Library of Chunks
        • Overlearning
Powered by GitBook
On this page
  • Fixing the noise in the data
  • Production
  • Errors with training, errors, etc.
  • Learning what train loss epochs, learning rates, etc is
  • Unbalanced data
  • Stochastic Gradient Descent

Was this helpful?

  1. Sciences
  2. Machine Learning
  3. Courses
  4. Practical Deep learning for coders

Lesson 2

PreviousLesson 1NextLesson 3

Last updated 5 years ago

Was this helpful?

Learning rate finder

W hat you are looking for is the strongest downward slope that is sticking around for quite a while. Not really looking for bumps. Always test which ones work best.

Fixing the noise in the data

What if Google Image search doesn't give you the right images every time.

Combining a human with the machine is the best way to go.

We're going to use the most mistake method and check for noise in the data. These are the datapoints that might be missed labeled.

Cleaning up

top_losses() - returns de top images that were the worse, and the index of the images. It's going to return the whole dataset that is sorted.

Also, every dataset in fastai has a x and a y. So if we pass idx to our x, it's gonna give us the images in the dataset (usually in our validation dataset) they the model wasn't sure about. In our particular case we're using our valid_ds. You would also re-run all the steps with our training and test set.

We can then use the FileDeleter(file_paths=top_loss_paths). Then you can delete the images that didn't work.

FileDeleter uses a GUI - see the link for more examples. Not good for productionizing because it's just in the note book, it's good for other practictionners. For productionizing you need to build a production webapp.

inference - you have your trained model and you are predicting things. You'll want to use CPU for inference in production, unless you have huge amounts of visitors in which case you have a lot more problems on top of that.

open_image() to open an image...

Production

The example uses Starlette which lets you use await, which allows for ascychonus... so it's not using a process while it's waiting for things.

Freehosting - pythong anywhere.

Errors with training, errors, etc.

Learning_rates, valid and training loss: Not good if your training loss is higher than your valid loss. This means you havn't trained enough. Either your learning rate it too low or not enough epochs. A correct model has a train loss lower than the valid loss. Not good if train loss is WAY HIGHER than valid loss.

Too many epochs - overfitting - model doesn't generalize well.

You are over fitting if the error rate improves for a while and then goes down again.

Learning what train loss epochs, learning rates, etc is

np.argmax - find the highest number and tell me what the index is.

Also, metrics are always going to be applied to the validation set.

checkout matrixmultiplication.xyz

2 things multiplied together + 2 things multiplied together = dot product. When you have lots of those (for example with Yi, and Xi's, that's called a matrix product). so Yi = a1Xi + a2X2i can be rewritten by y=Xa.

The a1, a2's are the coefs, and there's just one of them (no i's).

We can now use pytorch to run y=Xa in one line of code in pytorch.

Unbalanced data

What to do? Nothing. Try it. It always works. If there really arn't a lot, the best thing to do is to get the class that doesn't have a lot of representations and make a lot of copies of it - oversampling - But it's rare that you would need to do that.

Stochastic Gradient Descent

x@a is a matrix multiplication, or vector - matrix, or vector vector, or tensor multiplication.

tensor: it's an array. A 1d array, 2d, 3d, 4d, etc.

Dimension = rank.

if i write : a = tensor (-1.,1) - then write just that 1. makes the whole thing tell python they're all floats (instead of writing (-1.0, 1.0).

How do we fit the data? Stochastic gradient descent. It's almost the same as trying to fit a line to a graphic with a bunch of data.

You want to find parameters (weights) such that you minimize the error between the points and the line x@a. (a unknown). For a regression problem the most common error function or loss function is the mean squared error.

When we get a line, we get a loss, and we want to improve the line slightly. The derivative of the gradient tells us in what direction the move the line. In pytorch this is done with loss.backward(). What happens to the derivatives? It's gets put into and attribute called .grad

def update():
    y_hat = x@a
    loss = ms(y, y_hat)
    if t % 10 == 0: print(loss)
    loss.backward()
    with torch.no_grad():
        a.sub_(lr * a.grad) # going to take the coef a, and subtract (.sub) ou grad.
                            # the _ tells us it's inplace. lr is we only want a tiny step
                            # we have to subtract because we want to oposite of the grad
        a.grad.zero_()

grad: all the derivatives are stored in grad.

SGD vs gradient descent - grabing a batch of size 64 instead of calculating the loss of every single point (or images , or whatever). We grab 64 images AT RANDOM, calculate the loss on those 64 images, and update the weights.

The 64 batch is called a mini batch. That approach is called SGD.

For classification problems we use cross entropy loss, aka negative log likelihood loss. this penalizes inccorect condifent predictions and correct unconfident predictions.

Vocab :

Epoch - you run through all of your data. But each time you see a data point, you run the risk of over fitting. Which si why you don't want to have too many epochs.

sgd: gradient descent using minibatches (which are random points).

Parameters = weights = coefficients

Regularization : all the techniques that make it so when you train your model, it's going to generalize well on data it hasn't seen.