This section explains how to explore the racial bias that may currently or historically contribute to the data you've collected. All algorithms, however basic or complex, require a dataset. Although data is often considered “objective,” it is far from it. Data reflects the dynamics, including bias, of a given context. Racial bias exists in many forms in today’s school systems as a result of decades of racial and socioeconomic segregation, as well as new and old systemic structures that disadvantage Black, Brown, and low-income students. This section also includes a with some basic code to demonstrate how you might find blindspots in your dataset. As technologists in education, we must think critically about the datasets we use and the biases they perpetuateーeven when they are “accurate.” This section covers common types of biases that can exist in your dataset and ways to address these issues. After this section, you'll investigate the ways you use this data for . 🎯 Goals
Check your dataset for existing racial bias Identify missing or biased data and modify your dataset Identify forms of bias that must be addressed outside of your algorithm Future-proof your dataset with an update plan 🚧 Caution!
Make sure you understand the context of your dataset. Take responsibility for understanding the social contexts in which the data was collected, and the history behind this data. Even if you get your data from a school, you should be able to thoroughly explain the context behind each of the data points yourself. If you’ve collected the data yourself from within your product, make sure the whole team understands what this data represents and the social context in which students and teachers use your product. Ask yourself questions such as:
What data do we need? Where and how did/will we get the data? What is correlated with race? What racial bias might exists in the data? Are the impacted populations comfortable with what the data suggests? Are these aligned to what outcomes they want? Does the data represent our expected user population? Is there enough data in all quadrants? ✅ Activities for Datasets
Section 1: Finding Problems
Activity 1: Who Should Be Present in Your Dataset This activity emphasizes basic good data practice, but it's still important to call out before you check for blind spots. Identify which students make up your current and target populations. How does their profile differ from that of an “average” hypothetical student, and what cultural challenges and perspectives are important to represent? If you hope to serve mostly large urban school districts across America, your dataset should represent a more diverse population than a subset of smaller, suburban districts. It is fine to begin with a small dataset while partnering with a few customers, but before you set your sights on a larger customer base, make sure you can access a larger, representative dataset that will match the population of schools you hope will deploy your product. Research shows that you need to validate that your model works well for students of different races, of English Language Learners (ELL) vs. non-ELL status, and of . Exercise: Identify your target population and research the demographics of students and teachers at these schools. Compare this to the demographic breakdown of your dataset, and modify where needed. Contrary to what you might think, you should make sure race is included! Even if you can’t collect race for individual student data points, you can use the overall demographic breakdown of schools whose data you use.
Activity 2: Check for (Data) Blind Spots Now that you know who should be represented in your dataset, identify what scenarios should be present in your dataset too. Machine learning algorithms can only learn based on the data you provide. This places enormous responsibility in the hands of the provider of data (you!). If there are blind spots in your data, your algorithm will struggle to handle students who fall in those blind spots effectively. For example, even if you have enough non-white students represented in your data, if your dataset contains few examples of non-white students who “succeed,” your algorithm will struggle to advance non-white students.
You can use to demonstrate ways to find blind spots in your datasets. This is an often-skipped yet critical step. You should understand not only your target population of students but also the type of students that should be represented in your dataset. You will need to address these blind spots by improving your dataset. For example, if you find that your data has only a few examples of Black students who receive all A’s throughout high school, you’ll need to modify to widen your sample. During development, we encourage you to share your blind spots with the schools you work with, and especially with entities that provided data to you. They might recognize certain blind spots as problems that you may not recognize on your own. Some of these gaps you may not be able to fix, but you will need to consider them in the design and implementation of your product.
“But race is a sensitive data point, so we don’t collect it”, said every company ever. Fairness through unawareness , and treating all students “the same” can lead to gaps in awareness about how technologies and algorithms disadvantage particular populations. Tracking race and other sensitive data points will help you identify implicit bias, representation or sample bias, and less obvious blindspots in your data. Even if you aren’t able to collect individual students’ race, you can always use schools’ demographic breakdowns which are public information.
Activity 3: Check for Proxies "We don't collect race data, so we're good, right?" Race shouldn't be an input feature for a machine learning algorithm in education (and this is illegal in many cases). However, your data might implicitly encode race in other features. Many features, or combinations of features, in your dataset, might correlate strongly with race, such as free and reduced lunch (FRL) status, discipline records, and zip code. You may also find that combinations of features correlate with race that you didn’t expect. These findings require you to think deeply about why this might be the case and to share them with schools you work with. For example, certain demographics may be , reflected in the number of hints requested. Talk with schools to learn what this might mean and how your model could perpetuate or account for this behavior. Furthermore, even if you don’t collect race or any proxies for it, you are still responsible for testing whether your algorithms and products for students of different races. To Do: You can use a practice called to help identify problematic features or factors that are used as inputs in your algorithm. Activity 4: Check Your Labels You will need to use to categorize data in your dataset (e.g. you might assign labels in the form of test scores, academic grades, dropout events, etc. to each student record). It's easy to think of labels as the "truth" the same way we often think of school test results or true/false events to be “objective” measures of learning or success, however, in reality, many of these labels are frequently influenced by in education. If you don't explicitly investigate for potential bias, your ML model will encode this bias into your product. This is especially true if you chose to use unsupervised ML which will use algorithms to predict labels or cluster data according to patterns the algorithm finds, rather than labels your team provided. Algorithms can introduce even more bias when creating their own categories than a thoughtful human might. Furthermore, many of these metrics, like grades or dropout events, suffer from a lack of context and confirmation: Were grades assigned fairly? Would other teachers agree with the grade determination? Was the content culturally accessible, or did it mention places most of the school’s Black students have never visited? Did zero-tolerance suspension policies change after the data was recorded? What other pieces of data about a student, like the projects they created or the way they communicate their ideas and work with other students, are missing? Can we accurately categorize a dropout as “negative”? These are already complex questions in the education space, and they are further complicated by the use of technology and algorithms. It is even more important that you and the schools you work with openly discuss the ways bias is reflected in school data and how your product incorporates this data, so that humans at the school can interpret this information correctly.
Proxy Labels and Label Prediction: Sometimes, exact labels as you want to use them won’t exist, so you'll use a different feature (or metric) as a proxy for the label you want to use. For example, you may choose to use time spent on an assignment as a proxy for student engagement, but these two features are not exactly the same. It is important to ensure you investigate the relationship between your ideal label and the feature you use as its proxy to assess for ways this could go wrong. In this example, what contributes to time spent that may not apply to student engagement? When might time spent and student engagement not be correlated? Perhaps students who are confused or bored will take longer to move through the same modules. Make sure you account for these in the conclusions you take away from the data or, eventually, the way you interpret the output from your algorithms.
To Do: What if my dataset doesn’t have labels or a proxy? Read in the Appendix for guidance.
Activity 5: Check Your Values With machine learning, the labels you choose will teach your algorithm what is “right” and “wrong”. Your machine learning algorithm will eventually learn to output its own values based on what it has learned. For example, imagine that an algorithm learns from real discipline data in which Black boys are suspended at a compared to other students. The algorithm will "correctly" learn to categorize Black boys' behavioral records as severe and reinforce the existing bias, based on what it learned from real data. Rather than automating the bias in existing systems, what you determine to be “right” or “wrong” should be informed by the values of the schools and communities you work with. You’ll need to label your training data to teach your algorithm that the suspension of a Black boy and not a white boy with the same behavioral record is "wrong", so that the judgment of your algorithm aligns with your desired values and not the current system today. This is an obvious case but In this example, including more data from Black students in your training data won’t solve the problem, because we don’t have enough education systems free from racial bias to provide examples of Black students treated fairly.
explains a similar challenge in biased recruiting: “It may be even more challenging in other arenas to find a target variable that does not encode racial skewing vis-à-vis the actual outcome of concern. In the employment context, for instance, employers want to predict success on the job. But the data on past success may be skewed by the company’s past discrimination in hiring or promotion practices. There is nothing in the past data that reliably represents “job success” in a nondiscriminatory environment.” To Do: Ask schools you work with about the world they want to live in and reflect on the type of school environment you hope to support with your product.
For example, than other students. This toolkit aims to help you identify historical biases that create this phenomenon so that you can mitigate them rather than automating the existing bias in the system. It may be necessary to modify labels to account for scenarios in which Black boys should not have been suspended, to create a machine learning algorithm that aligns with the community’s values.
Activity 6: Keep Your Dataset Fresh Datasets are not staticーthey grow and adapt over time. If your dataset is composed purely of student data from 30 years ago, your results won’t be relevant for students today. It is important to craft a forward-looking dataset strategy at the start to ensure your data stays relevant.
To Do: Depending on the type of data you’re using, you may need to update every year or even more often in some cases, depending on your use case. Here are some questions to consider:
Questions About Your Dataset
Summary Questions for Finding Problems Does your data represent the population of schools you want to work with? What do your labels actually mean? Are they an actual categorization or a proxy? Where did your labels come from? What is the history behind them? Who created them? What bias may have influenced them? Would others agree with these labels? Were the labels cross-checked or validated? Do the labels align with your company values and those of the schools you work with? Do you have a plan for where to get new labels as your datasets change? Section 2: Addressing Problems
So you’ve found some tricky places where bias might come into your product through a dataset you are losing. What can you do now? This is a complex and very important area. These challenges may bring up difficult conversations within your team, but know that the discomfort is worthwhile. Change begins with awareness and requires continuously challenging conversations. Just like people, every dataset has its own “perspective” or bias. It is much better to investigate this bias than to turn a blind eye. Fairness through unawareness usually does not work.
Unfortunately, there’s no magic wand to “unbias” your dataset – this is still an active area of . New research from MIT recommends how to identify the additional data needed to . However, you can also use strategies to ensure your model is “biased” in a way that aligns with your and your schools’ values. For example, if you recognize that Black students demonstrate slower learning progress because of a suite of challenges they’ve faced in their schools, how can your model provide or recommend additional, tailored support to these students to achieve the same outcomes as other students. This helps you focus on achieving equitable outcomes that school communities believe are fair, rather than blindly treating all students the same and ignoring the fact that your product works better for some students than others. In this section, we’ll help you identify what is and isn’t fixable and scope the work required to fix it. For the unfixable elements, we encourage you to reconsider whether ML is appropriate. If you choose to proceed, make sure to disclose the problems you couldn’t address. Education is messy, but it is also a team sport. Work closely with schools, students, and parents to tackle problems that cannot be addressed with a technical solution. Your unsolved problems may uncover challenges of which schools were unaware, and your data might help them make a case for change. You should also consider these areas in the and of your product which are discussed in later sections. This section helps you consider: Strategies for augmenting your dataset Understand what is fixable within your algorithm and what must be fixed elsewhere Scope the work required to fix it Reconsider: is ML still appropriate?
Activity 1: Document The Bias You Find This is a touchy subject, we know, but the first step is to document the areas of bias you find. It is the best way to brainstorm with your team and to remember what should be disclosed to schools, students, and families. Many inequities cannot be addressed or fixed within your technology, but you should partner with schools to not only bring these issues to their attention but also make sure students, teachers, and families are aware of how to use your product in a way that does not perpetuate these biases. You can start with a table like the one below with your team. Can we address it in the dataset?
If no, how should we disclose this?
Note: This example calls out race and language, but bias could happen with many other features or features that are proxies for race like income, zipcode, or learning challenges and more.
Modify Data to Address Bias
You can use statistical techniques to fix biases you find, such as amplifying low-frequency examples. For example, if a training dataset associates English language learners to lower academic outcomes, you can statistically amplify examples of “successful” students with accents. For instance, take a literacy app that assesses students by listening to them read out loud. Imagine that in the schools that provided you with data, there were only a few English learners, and most of these English learners answered questions incorrectly because they had not received the language support they would require to catch up. As a result, you have only a few data points for students with accents who answered questions correctly. Your algorithms would have a hard time incorporating correct answers with accents and might also learn from these patterns that students with accents are typically wrong. You can address these trends by incorporating more of examples of students with accents who provide the correct answer, or by amplifying the weight of the examples you do have. If you aren’t able to address this within your dataset, disclose these concerns to the schools you work with or to education researchers – these are problems they grapple with on a daily basis.
Note: Fixing datasets requires an understanding of different approaches to “fairness”ーa very active and hotly debated research area. We will discuss this more in the section . Our referenced above provides a few techniques for mitigating dataset bias. 🎯Conclusion: It's all in the data
It's critical that you understand the full context around the data you collect and use to make conclusions, train your algorithms, and improve your product over time. Decades of racial and socioeconomic bias are embedded in educational outcome data and still impact student outcomes every day. As you build your product on top of new and existing data, make sure you understand how this data is recorded, what factors influence each data point, and what possible outliers might challenge your assumptions. This section provided examples of blindspots that can bias your datasets and suggestions for how to address them before you train your algorithms.
Now that you've evaluated your datasets, let's move on to .