For anyone who’s been worried that their online fingerprint was going to be used against them, this paper will provide them their vindication.
Here’s how they describe the analysis:
As a simple example, every website can effortlessly track whether a customer is using an iOS or an Android device; or track whether a customer comes to the website via a search engine or a click on a paid ad. In this project, we seek to understand whether the digital footprint helps augment information traditionally considered to be important for default prediction and whether it can be used for the prediction of consumer payment behavior and defaults.
Fair enough, but from there it just gets scary.
None of this is surprising to me. I’ve been arguing that this is what big data and machine learning are going to bring us for a long time. But it’s still startling to see it happen.
They proceed to look at all the various markers we leave behind online, that companies could look at while you’re on their website (or if they buy the data from somewhere else) to add to your credit score to determine your chance of default.
What if a company wants to estimate your creditworthiness (or other types of worthiness) without pulling your credit? They can do this analysis, or something like it, for free by looking at all this public information you drop while using the internet.
Not only will companies do this while you’re on their website (they can easily tell your OS just by showing up, along with the site you came from, and if they’re getting data from you they can learn where you live, what your email address is, and tons more data they can use to rate you.
And these things evidently matter a lot in predicting default.
When combining information from both variables (“Operating system” and “Email host”), default rates are even more dispersed. We observe the lowest default rate for Mac-users with a T-online email address. The default rate for this combination is 0.36%, which is lower than the average default rate in the 1st decile of FICO scores. On the other extreme, Android users with a Yahoo email address have an average default rate of 4.30%, significantly higher than the 2.69% default rate in the highest decile of FICO scores.
In general, there were a few things that jumped out as predictors.
Unsupervised Learning — Security, Tech, and AI in 10 minutes…
Get a weekly breakdown of what's happening in security and tech—and why it matters.
- iOS vs. Android (iOS users were around half as likely to default)
- Emails with name in them were better
- Desktops defaulted far less than mobile
- People with numbers their emails defaulted more
- People with old domain emails (Hotmail, Yahoo) defaulted more
- People who ordered at night instead of in the afternoon defaulted more
The most interesting thing about this is how easily this stuff can be (and is) gathered from users just during a regular browsing session. Especially if you’re on the website itself that is going to making the decision.
These tech fingerprint ratings are as good as or better than actual credit scores.
To me, this is what big data is all about. It’s not what it should be about. But it is what it’ll be used for. And it already is.
Big data combined with machine learning has only one purpose, and that is to answer questions and make predictions.
The question and prediction of, “Will this person pay me back?” is one of the oldest in human history.
Expect AI and data science to focus on questions like those first.
And if you were worried that your internet droppings might one day be used to judge you, don’t worry anymore.
It’s absolutely true, and it’ll only become more so as the technology advances.