image by thinkstock
I think many are confused about bias in artificial intelligence.
I think what it should mean is when you present training data to an algorithm that doesn’t represent reality. So you thought you were telling the AI how the world really is, but for some sampling-related reason you fail to do that.
The result is poor predictive capabilities or some other negative effect.
What I think we could be seeing a lot of though, is situations where the algorithms are presented with accurate data about the world, but the analysis produced by the AI is offensive in some way.
This could come in a couple of forms that I can think of immediately:
The AI tells us something about reality that is uncomfortable.
The AI tells creates a stereotype of groups by surfacing options for “people like them”.
In the first case, analysis of larger and larger datasets is likely to reveal truth in an uncomfortable way, for example maybe saying that Asian women don’t often select black men as potential dates. This is reality of course, but in the polite and insulated world of common courtesy we like to believe everyone likes everyone else the same.
Big data analysis and AIs will peer through political correctness and show us things we don’t want to see or talk about.
In the second case, you might tell an AI that you’re a Trump supporter who didn’t go to college, and it might recommend a local gun shop or a NASCAR event. Or maybe a way to make money in a tough economy. And people might find that rude.
Now imagine all the various ways this awkwardness could play out, for different ethnic groups, different socio-economic groups, education levels, etc.
Basically, we need to understand the difference between AI having bad training data in the sense that it doesn’t represent reality, vs. algorithms producing views of reality that make different groups unhappy.
There will be tremendous pressure to treat case 2 as case 1 for political reasons.
Unsupervised Learning — Security, Tech, and AI in 10 minutes…
Get a weekly breakdown of what's happening in security and tech—and why it matters.
But in reality what the engineers and product teams might do is simply write a hard rule that removes a given analysis or recommendation, even though feeding more and more quality data about the world will yield the same results.
Another example might be an algorithm recommendation for women in Shanghai for a product that whitens their skin. If a PC group in San Francisco hears about this they’ll say the algorithm is biased towards white people, and against people of color.
But the truth might be that it was a great product match, because so many women exactly like her user want that product, and in fact she did too.
In short, algorithms aren’t biased for revealing a version of the world that we don’t want. They’re only biased if they fail to represent reality. We have to understand this distinction, and work to keep the line between these two situations as bright as possible.
And perhaps it’s ok to tweak algorithms to not produce results that could be offensive to anyone. That’s a product decision that people should be allowed to make. But I have a feeling that companies who lean strongly in this direction will face fierce competition from those who let unpleasant truth shine through.
I think the better algorithms get, and the more data they see, the more insightful and potentially awkward truths will be revealed to us.
We will simply have to acclimate to this reality as a waste product of machine learning.