I wrote recently about how AI’s biggest advantage in security—at least in the short term—will come from improving coverage rather than analysis quality.
I was talking specifically about things like detection of threats within a SOC, or finding asteroids that might hit earth, or listening for dangerous conversations in millions of phone calls. Basically, any situation where there just aren’t enough (trained) humans to look at what needs to be evaluated.
But there’s another use case for AI as well, and that’s on the attack side—along with its mirror image on defense.
Emulating crowdsourced security
The key advantage of crowdsourcing security—especially where the scope is large and relatively open—is that hundreds or thousands of people looking at a particular target can find the nooks and crannies that a single person might miss.
This is true just due to human nature. People have biases and experience that skew how they test. They also have limited willpower, organization skills, and time available to do a given test, so individual testers can sometimes miss minor things. And minor things can sometimes become major things.
Crowdsourcing security doesn’t solve this, but it addresses it. It takes the weaknesses of hundreds or thousands of testers and lays them on top of each other so that after many such layers you eventually cover all the surface. Someone might miss something obscure, but it’ll be caught by someone else.
This function will be taken over by AI, which will look something like this:
- Automated Information Gathering
- Data Normalization
- ML Algorithms Extract Best Targets
So elite attack teams will basically rig up extraordinary automation systems that constantly crawl and parse the entire internet, with special focus on certain targets, and then take all the data they find and put it into a format that can be consumed by various types of algorithms.
Instead of needing 10,000 elite people to parse all this data and look for gems that could yield results, you can instead have your best algorithms looking at that same data, and only need dozens—or a few hundred—elite attackers to get the same benefit.
So the better the data gathering gets, or the algorithms, or the few human attackers, the better the results with less time and cost.
And it’ll be exactly the same for defensive teams.
They’ll have massive automation farms constantly polling their attack surface, extracting information from it, and putting that information into a lake format that can be parsed by their own algorithms.
Then their highly trained Blue Team will review the recommendations that are surfaced by the algorithms, which will be the weakest points, the most likely points of attack based on what attacks are being seen in the wild, based on the most likely threat actor, etc.
In five or ten years the amount of infrastructure that’ll be out there, and the amount of data it’ll be generating in terms of attack surface monitoring, will be far too much to every catch up with using humans. There’s no amount of training, online courses, university education, or any other method that can create millions upon millions of trained infosec analysts. It’s a fantasy already, and it’s only becoming less and less possible as the Big Bang continues to expand.
So what we’ll have is the battle of the algorithms.
Attacker algorithms crawling everything and telling the humans where to focus, and defender algorithms crawling everything and telling them where to defend.
This probably isn’t the AI-to-InfoSec interaction you thought we were going to get, but I think it’s the one that’s coming.
Notes
- Max Tegmark and Sam Harris think that human-AI hybrids will have a short half-life, and I agree with that. Once AI becomes general, and is able to improve itself, even the pieces that humans had to do will get replaced. But we’re somewhere between 10 and 50 years from that according to most experts. My personal guess for AGI is around 15 years, but that’s just a feeling as a non-expert who’s read a lot of books about it.
- The explosion of new business systems and the data they create will be one multiplier to the data out there. The second factor will be the fact that we’ll want all existing and new systems to create vastly more data so that algorithms can make sense of it. And the third (and perhaps biggest factor will be the explosion of IoT, which will create far more devices that create far more data.