Dec 28, 2014
If you’ve been in Information Security for a while you have been asked for, or have seen others get asked for, an authoritative list of security metrics. This normally results in chaotic wheel invention, i.e. coming up with a new and unsatisfactory list each time.
The goal of this primer is not to make you an expert, but to provide 1) a general overview of what makes a good security metric, and 2) give you a list of options to use as a starting point.
This topic deserves a much more thorough treatment (see the notes), but here are some primary considerations:
Decision-enabling: the metric should enable decision-makers to actually do something. If it doesn’t, you are likely wasting resources on tracking it.
Tangible: the metric should be definable using numbers. If you can’t go beyond high, medium, or low (or some other qualitative designation) then either refine or discard it.
Narrative-supporting: the metric should lend well to the security story you are trying to tell. And this doesn’t mean the short-term story about the amazing things you did last quarter (to get a raise) but rather the overall strategy story for what the program is designed to achieve. If your metric isn’t helpful in telling that story, then ask yourself why it’s there.
Data-backed: the metric should have a solid foundation that can be explained and demonstrated to people of various levels. Nothing hurts a metric more than being pure theory or fantasy.
Repeatable: the metric should be easy to gather and update on a regular basis. Remember that gathering metrics, just as anything else in security, has a cost. Don’t spend 100K a month in person-time on harvesting and curating metrics data if it’s not worth it. Good (repeatable) metrics and automation can help swing this in the right direction.
Resource-adjusted: your metrics program should scale according to the resources you have to make changes. Similar to logging without monitoring, you want to avoid wasting effort on tracking 149 metrics if you aren’t translating their output into actual improvement.
Discrete: your metric should usually be broken down to the point of only having data coming data coming into them, and not other metrics. This serves the transparency and data-backing points that make it easier to explain your narrative to a skeptical audience.
Track what matters, not what is trackable: many make the mistake of finding something they can get metrics for, and then tracking that. Don’t do that. Start with what you should want to track, and see if those are possible. Anyone can track poor metrics and get no benefit from doing so.
Less is Usually More: having fewer metrics vs. many is generally better for a healthy program. This isn’t to say that you should only have a certain number, but you should only consider adding more metrics when you have achieved maximum value (with zero waste) from those you already have.
I won’t necessarily being endorsing everything listed below as a good metric, or as adhering to the guidelines above; the goal is to start the conversation and expose you to some of your options.
These I’ll call out as special in that I believe nearly every metrics program should have them.
If you only have one organizational metric, this should be it. There is no more important piece of information for your security organization than the comparison of how much you spent vs. how much value you got from the program.
Concept: Know if your program is worth it.
How to Gather It: ROSI is a simple calculation of what you spent vs. how much benefit you got from that spend in terms of risk reduced or dollars saved. The second part is the hard one, as it requires that you know how much risk you had before and after, and/or that you know how much those risk values equated to in dollars.
How to Display It: Line chart with area map is a great way to go, i.e. showing the difference between risk before and after. Another great way to do this is with a simple bar chart that shows before and after. But if you can (solidly) get to dollars saved that’s the ideal.
This metric gives you a good indicator of how large your security team is relative to what they’re securing. You’ll need to modify it according to the type of company you have, e.g., whether you’re securing hosts or applications, etc.
Concept: Know if your team is properly staffed.
How to Gather It: Determine the number of things that are being defended by the security team, whether that’s applications, hosts, etc., and then divide that by the number of people you have in the security department. There are are some decent calculators out there for it to see how others are doing it.
How to Display It: This metric is often just displayed as a number or percentage for the ratio above.
Note that this is most useful as a general health-check to make sure you’re not doing something way out of the norm for no reason; it may not be one to track and adjust on a regular basis.
This metric is application security focused and captures what percentage of applications are under security management1.
This example used applications, but you can do the same with “system” for a network security focus.
As described in the notes, security management means a very specific thing in this context, i.e. that the application is enrolled in the formal risk management process for that type of system—in this case, an application.
Concept: You can’t secure what you don’t know about.
How to Gather It: Compare the list of total applications you have to those that are formally part of your risk management program.
How to Display It: A simple numeric percentage is good, as is a pie chart showing the two slices (under management, and not).
There are many possible variations on this metric, with the most common being Percentage of Critical Applications Under Management. I have listed the generic form here, but it is often helpful to simply have this metric for every tier of application risk in the organization, e.g., critical, high, medium, low.
We all know that antivirus isn’t a great defense anymore. But neither are door locks, and they’re still a great idea. Basic malware defense is a joke right up until the point that you have a major incident that could have been stopped by it. The key is knowing what it is and isn’t.
Concept: Make sure you don’t lose millions of dollars in productivity to easily preventable malware.
How to Gather It: The anti-malware system likely has a console you can use.
How to Display It: This metric is often just displayed as a number or percentage. The important thing is knowing which systems to exclude from the tally, and how to note exceptions.
This tells you how far behind your organization is in updates. One key thing to keep in mind here is that applications are people too. Find a way to include their updates in this metric if at all possible.
Concept: The farther your organization is behind on patching the easier it is for an attacker to gain or expand a foothold on your network.
How to Gather It: A number of patch management systems are quite good at providing this information.
How to Display It: A bell curve would be nice, showing how many are completely up to date, how many are years behind, and the vast majority that are somewhere in between. Maybe show where you want that curve to be vs. where it is today.
This is designed to tell you how much you’ve slipped from your ideal over the years with respect to things like running as non-privileged users, having the ability to use removable media, internet use, etc.
Concept: Know how much risk your exceptions are introducing into the environment.
How to Gather It: This should be pretty straightforward using any sort of configuration management tool.
How to Display It: This metric is often just displayed as a number or percentage. Nothing fancy needed. High numbers are usually bad.
Few things are as telling as making a list of all changes that took place in your organization, and looking at what percentage of them went through the formal process.
Concept: If random changes are occurring without security’s knowledge, then how will you detect an attacker?
How to Gather It: This will likely rely on a configuration management and/or continuous monitoring solution.
How to Display It: This metric is often just displayed as a number or percentage. The important thing is to separate legitimate changes from unauthorized ones.
This is another great maturity metric, as people have known for over a decade how important it is. So if your organization still has a significant percentage of systems using weak passwords, it’s critical to see (and fix) that.
Concept: It’s a great proxy for security maturity.
How to Gather It: There are tools for analyzing it for things like AD, but for smaller and custom systems you’ll have to do some manual analysis.
How to Display It: A bell curve would be good for this as well, with maybe five tiers of strength.
When I go into an organization as a consultant—whether it’s a Global/Fortune 10 or an SMB—I generally look for a few key things starting out.
If I only have a moment to figure out how well an organization is doing, I’ve found these three things work well:
Percentage of systems that are patched
Percentage of applications under security management
Percentage of outbound DNS traffic that is monitored and filtered
I think that organizations doing all three of these at 95% or better are likely to be extremely healthy from a security standpoint. The goal here is not to use these as a replacement for a full metrics program, but when you’re trying to get an instant feel for how mature a company is security-wise, these have served as excellent proxies.
I have a methodology I like to use with customers centered around what I call the blocking and tackling of infosec. Many consider this to be patching, and perhaps that’s true, but I’ve long since started considering that to be “showing up”, not doing the fundamentals once you’re there.
Unsupervised Learning — Security, Tech, and AI in 10 minutes…
Get a weekly breakdown of what's happening in security and tech—and why it matters.
Here are some of the fundamental markers of a good security program.
Capturing log data
Collecting it centrally
Monitoring it continuously
Responding to it appropriately
Taking corrective action afterwards
And since these are fundamental markers, they also make great metrics.
The water hose isn’t 100% analogous, but it’s instructive. In general you’re unable to do the later steps if you have failed to do the earlier ones, and a failure at later stages nullifies what was done previously. It doesn’t matter if you are monitoring logs if you’re not capturing the right ones, for example, and failure to respond to an incident means you didn’t get much value from logging and monitoring it.
Here are some possible metrics that correspond to that flow:
Percentage of applications sending logs to the centralized log management system
Percentage of those logs being monitored
Chance of a critical security event being alerted on from the SOC
Chance of a critical security event being responded to by the SIRT
Percentage of readiness for post-incident management and mitigation
One way to gather these is to keep an evergreen threat matrix for everything bad you’re concerned could happen within your organization (something you should be doing anyway). And for each of those, simply walk through the five (5) steps from capturing logs to taking corrective action post-incident, and see how your existing controls would do.
For organizations of medium size or greater, i.e. those who are at least doing some basic security, I find it hard to imagine a better test of your security readiness than continuously matching your threats to your controls in this way, and then tracking the results as part of your metrics program.
These are metrics that are frequently mentioned in the literature that you may want to research further and consider adding to your program.
IS Budget as Percentage of IT Budget
IS Budget Spend Breakdown
Percentage of Users With Security Exceptions
Percentage of Staff Fully Trained on Infosec Awareness
Incident Detection Source Breakdown (External, Internal, Automated, Human, etc.)
Compliance Percentages (PCI/SOX/HIPAA/etc.)
Employee Behavior Metrics (hunting anomalies to correlate to risk factors)
Basic Defense Coverage Percentage
Outbound DLP Inspection Percentage
Number of Applications
Risk Assessment Coverage
Security Testing Coverage
Configuration / Change Management
Mean-time to Complete Changes
Percent of Changes with Security Review
Percentage of Changes with Security Exceptions
Number of Non-managed Changes (outside of formal process)
Total number of Incidents
Mean time to know (MTTK)
Mean-time to Incident Discovery
Percentage of Incidents Detected by Internal Controls
Percentage of Incidents Detected by Automation
Mean-time Between Security Incidents
Mean-time to Recovery
Percentage of Systems That Are Current With Patches
Patch Policy Compliance
Patch Management Coverage
Mean-time to Patch
Percentage of Critical Vulnerabilities Patched Within Policy Period
Percentage of High Vulnerabilities Patched Within Policy Period
Percentage of Medium Vulnerabilities Patched Within Policy Period
False positive rate
False negative rate
Vulnerability Scan Coverage
Percentage of Systems Without Known Severe Vulnerabilities
Mean-time to Mitigate Vulnerabilities
Time to Remediate (TTR)
[ NOTE: The reason I don’t like a number of these metrics, such as Risk Assessment Coverage, is because I believe they are covered under other metrics. In that case, any system/application that’s under security management would naturally have risk assessment coverage. This won’t always be granular enough, so it could be useful to use the individual components at time, but in most cases we want to know if our formal security process is being applied to a given system, and that’s what the primary metric above is capturing. ]
Running a security metrics program is both harder and easier than it seems, and most failure comes from not understanding, or losing sight of, the two (2) key purposes:
Detecting when changes to security controls need to be made
Displaying the current risk state as accurately as possible
One can argue that these two should be reversed, as you need to see X before you can change X. I prefer this order, however, because it prioritizes the goal of enabling decision makers to adjust security practices as required based on changing conditions.
That’s the key takeaway I’d like people to leave with:
I hope this has been helpful, and any recommendations for improvement would be greatly appreciated.
Notes and References
“Under security management” means that the organization has a formal process for assessing and maintaining the security of a given system throughout its lifecycle, and that the system/application in question is being managed by that process.
Many of these metrics came from two great sources on the topic of Security Metrics: Andrew Jaquith, and Dan Geer.
I recommend Andrew Jaquith’s book, Security Metrics: Replacing Fear, Uncertainty, and Doubt.
I recommend Dan Geer’s Presentation, “Measuring Security“.
Other sources for metrics included nist.gov, securitymetrics.org, cisecurity.org, and csoonline.com.
This isn’t meant to be any kind of authority on security metrics, but rather a starting point for organizing your thoughts and further research.
I have been an information security professional for the last 15 years with exposure to many of the world’s largest companies. I have spent most of that time as a penetration tester, auditor, and program consultant, and any perspective or opinion given above is based on that experience.