icon picker
Clearer Decision-Making with PCA

Learn how to generate the best chart of your multidimensional data
In our daily digital life, we are provided with plenty of information, meant to help us make better decisions: lists of products and services, and for each, a complete data sheet, along with a list of reviews, and for each, more information about the reviewer, and so on... This amount of data can feel overwhelming for our decision-making: which information is relevant? what best distinguishes these two products?
Enter Principal Component Analysis (PCA), a process routinely used in data exploration, machine learning, and data compression. PCA offers a different look at your data, providing you with the best visual summary possible.
This doc will give you an intuitive grasp of PCA and the tools to apply it to your decision-making, via a .

So What is PCA?

tree shadow.jpg
Let’s imagine you’d like to understand how the leaves on a tree are distributed.
We could measure the position and orientation of each leave, but it’s a lot of leaves, and a lot of data!
Alternatively, we could look at the tree’s shadow, which provides a good indication of this distribution. It’s simpler to look at, even if it’s not the tree itself.
We could do even better: we could choose the position of the sun which puts most distance between the leaves’ shadows — which separates them most.
In a nutshell, this is what PCA can do for us: it determines the best space of lower dimensions for our dataset. “Best” is the sense that it maximises the dispersion of its elements.
(image posted on by u/DanRG02)
PCA is incredibly useful:
it computes at once the best representation of our dataset in various forms: as a ranking (1 dimension), as a map (2 dimensions), and as a cloud (3 dimensions) and so on.
PCA tells us how much of the original dataset is actually explained by each form.
PCA factors in any correlation existing between your variables.
PCA works with as many samples and variables as needed — in the hundreds is quite common.

Decision-Making using PCA

PCA does not take the decision for us, but gives us a good grasp of the options available. Say you want to decide on which movie to watch next, based on their reviews:
Movie Reviews
Name
Variety
The New York Times
Vanity Fair
RogerEbert.com
1
Lightyear
50%
60%
35%
55%
2
Minions: The Rise of Gru
30%
40%
65%
65%
3
The Batman
45%
60%
65%
30%
4
The Northman
60%
75%
35%
30%
5
Thor: Love and Thunder
70%
60%
25%
45%
6
Top Gun: Maverick
50%
60%
65%
25%
There are no rows in this table
(Movie buffs, beware: The scores have been changed to make sure each movie has an average score of 50%).
PCA will find the position of each movie along an axis called first principal component:
Movie Reviews along the first Principal Component
5
Not synced yet
For our next trip to the silver screen, based on these reviews, the starker choice is between “The Northman” and “Minions”. And if the first one was up your alley, you’ll most likely appreciate “Thor” too. Same for “Top Gun” and “Lightyear”.

Of course, by summarising all these reviews along a single axis, we’re losing information. But PCA also tells us we’re keeping
65.27%
of it.
If you can’t make your mind between “Top Gun” and “Lightyear”, let’s have a look to the position along the second principal component. Since we have two coordinates, we can display a map, along these two components:
Movie Reviews along the first two Principal Components
5
Not synced yet
(The components are not shown on the diagram).
If you liked “The Batman” as much as the reviewers, it’s likely that you’ll prefer “Top Gun” over “Lightyear”.

By using the first two components, PCA tells us that we’re keeping
95.59%
of the information of these reviews. So we can safely decide using only this map!
💡 Now, the curious minds will ask: in the chart above, what does it mean to move from left to right, or from bottom to top?
This is what’s next: how to .
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.