Machine learning models: In bias we trust?
When the stakes are high, machine-learning models are sometimes used to aid human decision-makers. For instance, a model could predict which law school applicants are most likely to pass the bar exam to help an admissions officer determine which students should be accepted.
These models often have millions of parameters, so how they make predictions is nearly impossible for researchers to fully understand, let alone an admissions officer with no machine-learning experience. Researchers sometimes employ explanation methods that mimic a larger model by creating simple approximations of its predictions. These approximations, which are far easier to understand, help users determine whether to trust the model’s predictions.
But are these explanation methods fair? If an explanation method provides better approximations for men than for women, or for white people than for Black people, it may encourage users to trust the model’s predictions for some people but not for others.
MIT researchers took a hard look at the fairness of some widely used explanation methods. They found that the approximation quality of these explanations can vary dramatically between subgroups and that the quality is often significantly lower for minoritized subgroups.
In practice, this means that if the approximation quality is lower for female applicants, there is a mismatch between the explanations and the model’s predictions that could lead the admissions officer to wrongly reject more women than men.
Once the MIT researchers saw how pervasive these fairness gaps are, they tried several techniques to level the playing field. They were able to shrink some gaps, but couldn’t eradicate them. More