A lot of people are starting to talk about how Machine Learning can help attackers and defenders in cybersecurity.
It’s an interesting topic, and I want to break down the difference between four types of cases: Supervised and Unsupervised, and Attack and Defense.
First, Supervised vs. Unsupervised.
Supervised Learning is where you are looking for the answer for whether X is a Y thing or not. Is this a dog? Is this a real attack? Is that user malicious? Those are Supervised types of questions.
You feed Supervised ML algorithms by giving them two things:
- Tons of examples of situations where X was Y, and where X was not Y.
- Tons of data where we don’t know which it is.
The algorithm then decides which are Y and which are not.
With Unsupervised Learning you aren’t telling the algorithm that you have Y’s and not Y’s. You’re not asking for a yes or no answer back. What you’re doing is asking the algorithm to identify patterns in the data, which you can then explore.
So it might be that you give it a whole bunch of data about shopper behavior, and you find some weird pattern that you don’t understand. And after researching it you find out those shoppers were the ones who had recently become engaged.
Unsupervised Learning, in other words, shows you new things about data that you didn’t even know to ask. Whereas Supervised Learning answers whether new X’s are Y’s or not, where you already taught it what a Y was.
Great, now let’s do InfoSec
So the way this applies to infosec is like this: When attackers or defenders need to confirm a known thing, they might use Supervised Learning. And where they want to search for new ways to find attackers (or new victims) they will use Unsupervised Learning.
- Supervised Learning (Attacker)
- Question: Does this fuzzing attack yield RCE?
- Question: Is this target a qualified victim?
- Question: Is this a honeytrap or a real system?
- Supervised Learning (Defender)
- Question: Is this a pcap of attack traffic?
- Question: Will this user go rogue within 12 months?
- Question: Are these logs generated by a legitimate user or an attacker?
- Unsupervised Learning (Attacker)
- Question: Which of these fuzzing attempts should I investigate?
- Question: Find patterns in my internet scans.
- Question: Find patterns in these spam responses that might indicate who’s a more likely victim.
- Unsupervised Learning (Defender)
- Question: Show me patterns in outbound DNS requests.
- Question: Look at the frequency of outbound file uploads.
- Question: Show me user activity in our flagship web app.
Just as in other disciplines, the breakdown is clear: Supervised Learning gives you a yes/no to a question you already know to ask, and Unsupervised Learning gives you patterns and hints about possible new questions you should be asking.
Expect both attackers and defenders to be using both with increasing frequency in the coming years.