Random Projection

More computationally efficient than PCA. You would use this over PCA if your system has limited, or if there are too many dimensions for PCA.

Pick a random line and project onto it

This doesn't really make sens in 2d, but in high dimensions it really works well.

Random projection is just multiplying by a random matrix.

The distance between 2 rows (or 2 points) in the transformed space will be larger than (1-eps)||u-v||^2 and smaller than 1+eps||u-v||^2.

Epsilon is a value between 0 and 1. It goes into the calculation of how many columns are produced and It's the level of error we are allowing distortion to have in the reduction of dimensionality. This guarantees that distance are preserved between every pair of points in the dataset.

It is NOT mandatory for us to specify the number of components/dimensions that we want Random Projection to reduce our dataset down into. It can be computed by the algorithm.

Last updated