JulienBeaulieu
  • Introduction
  • Sciences
    • Math
      • Probability
        • Bayes Rule
        • Binomial distribution
        • Conditional Probability
      • Statistics
        • Descriptive Statistics
        • Inferential Statistics
          • Normal Distributions
          • Sampling Distributions
          • Confidence Intervals
          • Hypothesis Testing
          • AB Testing
        • Simple Linear Regression
        • Multiple Linear Regression
          • Statistical learning course
          • Model Assumptions And How To Address Each
        • Logistic Regression
      • Calculus
        • The big picture of Calculus
          • Derivatives
          • 2nd derivatives
          • The exponential e^x
        • Calculus
        • Gradient
      • Linear Algebra
        • Matrices
          • Matrix Multiplication
          • Inverses and Transpose and permutations
        • Vector Space and subspaces
        • Orthogonality
          • Orthogonal Sets
          • Projections
          • Least Squares
        • Gaussian Elimination
    • Programming
      • Command Line
      • Git & GitHub
      • Latex
      • Linear Algebra
        • Element-wise operations, Multiplication Transpose
      • Encodings and Character Sets
      • Uncategorized
      • Navigating Your Working Directory and File I/O
      • Python
        • Problem Solving
        • Strings
        • Lists & Dictionaries
        • Storing Data
        • HTTP Requests
      • SQL
        • Basic Statements
        • Entity Relationship Diagram
      • Jupyter Notebooks
      • Data Analysis
        • Data Visualization
          • Data Viz Cheat Sheet
          • Explanatory Analysis
          • Univariate Exploration of Data
            • Bar Chart
            • Pie Charts
            • Histograms
            • Kernel Density Estimation
            • Figures, Axes, and Subplots
            • Choosing a Plot for Discrete Data
            • Scales and Transformations (Log)
          • Bivariate Exploration of Data
            • Scatterplots
            • Overplotting, Transparency, and Jitter
            • Heatmaps
            • Violin & Box Plots
            • Categorical Variable Analysis
            • Faceting
            • Line Plots
            • Adapted Bar Charts
            • Q-Q, Swarm, Rug, Strip, Stacked, and Rigeline Plots
          • Multivariate Exploration of Data
            • Non-Positional Encodings for Third Variables
            • Color Palettes
            • Faceting for Multivariate Data
            • Plot and Correlation Matrices
            • Other Adaptations of Bivariate PLots
            • Feature Engineering for Data Viz
        • Python - Cheat Sheet
    • Machine Learning
      • Courses
        • Practical Deep learning for coders
          • Convolutional Neural Networks
            • Image Restauration
            • U-net
          • Lesson 1
          • Lesson 2
          • Lesson 3
          • Lesson 4 NLP, Collaborative filtering, Embeddings
          • Lesson 5 - Backprop, Accelerated SGD
          • Tabular data
        • Fast.ai - Intro to ML
          • Neural Nets
          • Business Applications
          • Class 1 & 2 - Random Forests
          • Lessons 3 & 4
      • Unsupervised Learning
        • Dimensionality Reduction
          • Independant Component Analysis
          • Random Projection
          • Principal Component Analysis
        • K-Means
        • Hierarchical Clustering
        • DBSCAN
        • Gaussian Mixture Model Clustering
        • Cluster Validation
      • Preprocessing
      • Machine Learning Overview
        • Confusion Matrix
      • Linear Regression
        • Feature Scaling and Normalization
        • Regularization
        • Polynomial Regression
        • Error functions
      • Decision Trees
      • Support Vector Machines
      • Training and Tuning
      • Model Evaluation Metrics
      • NLP
      • Neural Networks
        • Perceptron Algorithm
        • Multilayer Perceptron
        • Neural Network Architecture
        • Gradient Descent
        • Backpropagation
        • Training Neural Networks
  • Business
    • Analytics
      • KPIs for a Website
  • Books
    • Statistics
      • Practice Statistics for Data Science
        • Exploring Binary and Categorical Data
        • Data and Sampling Distributions
        • Statistical Experiments and Significance Testing
        • Regression and Prediction
        • Classification
        • Correlation
    • Pragmatic Thinking and Learning
      • Untitled
    • A Mind For Numbers: How to Excel at Math and Science
      • Focused and diffuse mode
      • Procrastination
      • Working memory and long term memory
        • Chunking
      • Importance of sleeping
      • Q&A with Terrence Sejnowski
      • Illusions of competence
      • Seeing the bigger picture
        • The value of a Library of Chunks
        • Overlearning
Powered by GitBook
On this page
  • Universal approximation
  • Code
  • Normalization
  • Pytorch
  • softmax function

Was this helpful?

  1. Sciences
  2. Machine Learning
  3. Courses
  4. Fast.ai - Intro to ML

Neural Nets

PreviousFast.ai - Intro to MLNextBusiness Applications

Last updated 5 years ago

Was this helpful?

Universal approximation

A neural network can approximate any other function, to an arbitrarily close accuracy, as long as it's big enough.

Code

# reshape takes a tensor in and reshapes to whatever size you request. You only need to input 2 out 3 3, or n-1 out of n
# because the last one it will figure it out by itself
x_imgs = np.reshape(x_valid, (-1,28,28)); x_imgs.shape

Normalization

In deeplearning we need to normalize because we're trying to train a parameterized model. If we don't it's going to be harder to get a network that train effectively.

We need to make sure anything we do on the training set we do the exact same thing in the validation and test set. So for normalization we subtract the training set mean and divide by the training set std dev on the validation data (and not the validation mean and std dev).

Pytorch

Like numpy, but you can run in on a GPU.

GPUs allows for matrix computations to be done at much great speeds.

Loss

We need a loss function: lower the better we do.

How do we score how good we are? At the end we're gonna compute the derivative of the loss wrt the weight matrix, to figure out how to update it.

Negative log likelihood loss - AKA cross entropy. There are two version: Binary cross entropy, or binary NLL or multiple NLL

softmax function

This will get called when the layer gets calculated. You pass it the data from the previous layer. Softmax is the activation function we use - It takes the outputs from our final layer, we do e^(each_ouput) and divide that by the sum of the e^. Softmax gives us the probabilities. Because we're using e^(), we know that everyone of the probs is between 0 and 1. Also, big values in the input will turn out much bigger in the output (the final prob). 3.66 initially will turn out to be 0.86 probability.

The softmax non linearity is something that returns things that behave like probs, and that 1 of the probs are more likely to be high, and the others to be low. This is why it's a great function to use to make it easy for the NN to map the outcome we wanted.