No Labels, No Proxy

Note: This Section is an additional resource referenced in Activity 4: Check Your Labels in the

Datasets⁠

phase.

This is where unsupervised ML comes in. For example, you may want to make sense of a large amount of unlabeled student data to decipher what type of user behavior constitutes an “engaged” vs. “disengaged” learner when progressing to more complex topics. Clustering algorithms might help you group students and label them appropriately, but this can also introduce new problems. The algorithm will use any available data to identify patterns or “clusters,” possibly using data or combinations of data that may be proxies for race or socioeconomic status.

For instance, your algorithm might find that students click between multiple activities when they don’t understand the concept at hand. At face value, this might suggest these students are disengaged from the topic and have not mastered it yet. However, research might also show that hungry students click between multiple activities, whether they understand the content or not. Your model might incorrectly determine that all students who come to school hungry are not ready to advance to the next level. Again, it can be very difficult to understand the logic behind a clustering algorithm, so we encourage you to avoid unsupervised ML until the industry develops more transparent practices.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.