Machine Learning is the New Statistics

clustering-validation-statistics-k-means-pam-clusterings-visualization-1-e1476651426304

I’ve been trying to think of a way to describe how big Machine Learning is, and I think I finally have a decent one:

New in the sense that it builds upon, but not replaces.

Machine Learning is the new Statistics.

Why Statistics?

Because Statistics is the primary mechanism we’ve had for decades for learning about the world. It’s an approach to looking at data and learning something useful from it. It’s data gathering, manipulation, and analysis.

Machine Learning is similar, except its method of doing it is far more powerful.

Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed. ~ Arthur Samuel

Most importantly, Machine Learning can…well, learn. With traditional Statistics, you can potentially extract additional insights with more (and better) data, but the model for doing the analysis itself doesn’t improve. With Machine Learning that’s the entire point—it improves itself based on more data.

Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. ~ Google Answers

To be clear—it’s not just improving the results with better data. It’s improving it’s own ability to get results based on data. The difference is incredibly significant.

Unsupervised Learning — Security, Tech, and AI in 10 minutes…

Get a weekly breakdown of what's happening in security and tech—and why it matters.

So Machine Learning is not merely a new trick, a trend, or even a milestone. It’s not like the next gadget, instant messaging, or smartphones, or even the move to mobile.

It’s nothing less than a foundational upgrade to our ability to learn about the world, which applies to nearly everything else we care about.

Statistics greatly magnified our ability to do that, and Machine Learning will take us even further.

I’m not an expert in whatever field this would be, but let’s call it Reality Analysis, and Machine Learning lands us around Level 4.

  • REALITY ANALYSIS LEVEL 1: Look at a random event and give some random explanation.

  • REALITY ANALYSIS LEVEL 2: Look at random event and try to fit it into the past.

  • REALITY ANALYSIS LEVEL 3: Look at random event and look at it relative to lots of other random events. Compare them with lots of other metadata and look for trends.

  • REALITY ANALYSIS LEVEL 4: Do number 3, but constantly improve your ability to gain new insights from the data by constantly and automatically improving your models.

  • It seems like Level 5 would be improving your ability to improve your ability, but I think that already is (and will be) included in Machine Learning.

In terms of impact, Machine Learning—in my opinion—is best described as a new and vastly superior way to learn about the world we live in, and that’s what sets it apart from pretty much anything else in computing right now.

It takes us from:

  1. Single Event Interpretation (ad hoc, non-data-related analysis)

  2. Statistics (fixed model trend discovery)

  3. Machine Learning (continuous model improvement)

Now we just need to combine it with ways to gather real-time data about the world’s state, and we’ll be on our way to even greater discoveries.

Notes

  1. Mar 25, 2021 — My understanding has changed (hopefully improved?) since writing this, and I’d mostly agree with what I wrote here, but I’d characterize statistics as the “human truth-seeking math” that ML and many other fields are based on. As I say above, they are not the same, and they don’t cancel each other out.

  2. When I say it’s the “next” Statistics, this doesn’t mean that it will leave Statistics behind as an old system. Machine Learning is arguably part of, and still uses, statistical analysis.

  3. 18.10.16: For those who insist that Machine Learning is nothing but a buzzword, and that it’s 100% the same as Statistics, I have to respectfully (as a non-expert) disagree. Machine Learning’s central concept (and advantage) is self-improvement of its models. That is NOT the central concept of Statistics. Saying it’s all statistics is like saying consciousness is “just another information processing mechanism”. It is, but it’s different and special enough to be considered separately. Self-improvement matters. A lot.

Related posts: