PCA calculates values (matrices really) from a dataset of samples — each having values for each variable. In Coda, a table naturally represents the dataset: each sample is a row, each variable is a column.
This pack provides you with two sync tables:
Principal Components, that gives you the first two Principal Components values for your dataset,
Loadings, that gives you the weight of each variable in each Principal Component.
To use the Principal Components sync table
Drag the table to your document from the Pack tab on the right,
Press on the table the button
Choose what to sync
,
In the Label entry field, select the column with the names of your samples (e.g. MovieReviews.MovieName, DrinkingHabits.CountryName, etc),
In the Variable1 entry field, select the column with the values for your first variable (e.g. MovieReviews.Vanity, or DrinkingHabits.Spirits),
In the Variable2 entry field, select the column with the values for your second variable (e.g. MovieReviews.TheNewYorkTimes, or DrinkingHabits.Wine),
Add as many Variables as needed (up to 6) using
Add criteria
,
Once all variables have been added, press
Start sync
,
The sync table will return a single column “Principal Components”. Bring your pointer to the column to display the pulldown menu showing a small jigsaw piece and choose “PCA Pack - Principal Components options”. A dialog box opens, click there on “Related columns” and
Add
each one of them,
Now the sync table shows all samples as rows, with their own label, and their value along the first Principal Component (column Pc1) and the second PC (column Pc2):
Drinking Habits by Principal Components
5
Not synced yet
1
2
3
4
5
6
7
8
9
10
Label
Pc1
Pc2
Label
Pc1
Pc2
France
-1.395
-1.619
Italy
-1.760
-0.808
Switzerland
-1.102
-0.372
Austria
-0.332
1.120
UK
0.162
0.931
USA
0.445
0.405
Russia
3.409
-2.056
Czech Republic
1.403
2.076
Japan
-0.722
-0.126
Mexico
-0.108
0.448
No results from filter
The sync table can be displayed as a Scatter chart, with PC1 as horizontal axis, PC2 as vertical axis and segmented by Label.
To use the Loadings sync table
Drag the table to your document from the Pack tab on the right,
Press on the table the button
Choose what to sync
In the VariableNames entry field, input the list of how your variables are named, e.g.
Alternatively, create a table with a Text column, with each row listing one variable name, and in the VariableNames entry field, select this Text column, e.g.:
Variable Names
1
2
3
4
5
Name
Name
Spirits
Wine
Beer
Life Expectancy
Heart Disease Rate
There are no rows in this table
In the Variable1 entry field, select the column with the values for your first variable (e.g. MovieReviews.Vanity, or DrinkingHabits.Spirits)
In the Variable2 entry field, select the column with the values for your second variable (e.g. MovieReviews.TheNewYorkTimes, or DrinkingHabits.Wine)
Add as many Variables as needed (up to 6) using
Add criteria
Once all variables have been added, press
Start sync
The sync table will return a single column “Loadings”. Bring your pointer to the column to display the pulldown menu showing a small jigsaw piece and choose “PCA Pack - Loadings options”. A dialog box opens, click there on “Related columns” and
Add
each one of them.
Now the sync table shows all variables as rows, with their own label, and their weight for each principal component:
The last row of the sync table gives you the percentage of data explained by using respectively the first PC, the first two PCs, the first three PCs, etc. You can retrieve the first one with this formula (replacing XXX by the corresponding name):
Format({1}%, 100 * Loadings.Filter(Variable Name =”XXX Percentage Explained”)).Principal Component1)
To display only the Loading values, you need to filter out the last row: in the Filter tab on the right, press
Add filter
and select “Variable Name” “does not contain” “Percentage Explained”
To Use Two Datasets in a Doc
Coda allows only one instance of a sync table per doc. So the same sync table will be used for all your datasets. To add a second dataset to the Principal Components or Loadings sync table:
On the sync table, press Options
On the tab on the right, choose the PCA PackPress the
Add another sync
Select the data for your second analysis as you did for the first dataset
Once done, give your dataset a name by using
Add criteria
and selecting “Group”. In the entry field for the Group criteria, enter a name of your choosing, e.g. “MovieReviews” or “Drinking”
It’s advisable to also give a “Group” criteria to your first dataset
Press
Start sync
To use the results of the PCA for your second dataset, create a view of the sync table and
Add filter
selecting “Group” “is equal to” the group name you chose before.
You can use more than two datasets by repeating the steps above.
Want to print your doc? This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (