Calabrese’s Razor

I’ve long held the opinion that the community of “Information Security Experts” agree with each other 90% of the time, but waste 90% of their time arguing to the death with other InfoSec Experts about the remaining 10%.  This was painfully brought home to me several years ago as I was facilitating the consensus process around the Solaris Security document published by the Center for Internet Security.  You won’t believe the amount of time we spent arguing about seemingly trivial things like, “Should the system respond to echo broadcast?”  And as the consensus circle widened, we ended up wasting more time on these issues and repeating debates over and over again as new people joined the discussion.  In short, it was killing us.  People were burning out and failing to provide constructive feedback and we were failing to deliver updates in a timely fashion.

I see these kind of debates causing similar mayhem in the IT Ops and InfoSec groups at many organizations.  The problem is that in these cases the organizations are not simply debating the content of a document full of security recommendations, they’re arguing about matters of operational policy.  This seems to promote even more irrational passions, and also raises the stakes for failing to come to consensus and actually move forward.

At the low point of our crisis at the Center for Internet Security, the person who was most responsible for finding the solution was Chris Calabrese, who was facilitating the HP-UX benchmark for the Center. At roughly the same time as our issues at the Center, the IT Ops and InfoSec teams at Chris’ employer had gotten bogged down over similar kinds of issues and had decided to come up with an objective metric for deciding which information security controls were important and which ones were just not worth arguing about.  Suddenly the discussion of these issues was transformed from matters of opinion to matters of fact.  Consensus arrived quickly and nobody’s feelings got hurt.

Overview of the Metric

So we decided to adapt the metric that Chris had used to our work at the Center.  After some discussion, we decided that the metric had to account for two major factors: how important the security control was and how much negative operational impact the security control would impose.  Each of the two primary factors was made up of other components.

For example, the factors relating to the relative importance of a security control include:

  • Impact (I): Is the attack just a denial-of-service condition, or does it allow the attacker to actually gain access to the system? Does the attack allow privileged access?
  • Radius (R): Does the attack require local access or can it be conducted in an unauthenticated fashion over the network?
  • Effectiveness (E): Does the attack work against the system’s standard configuration, or is the control in question merely a backup in case of common misconfiguration, or even just a “defense in depth” measure that only comes into play after the failure of multiple controls?

Similarly, the administrative impact of a control was assessed based on two factors:

  • Administrative Impact (A): Would the change require significant changes to current administrative practice?
  • Frequency of Impact (F): How regularly would this impact be felt by the Operations teams?

The equation for deciding which controls were important simply evolved to: “(I * R * E) – (A* F)”.  In other words, multiply the terms related to the importance of the control to establish a positive value and then subtract the costs due to the administrative impact of the control.

The only thing missing was the actual numbers.  It turns out a very simple weighting scheme is sufficient:

  • Impact (I): Score 1 if attack is a denial-of-service, 2 if the attack allows unprivileged access, and 3 if the attack allows administrative access (or access to an admin-equivalent account like “oracle”, etc)
  • Radius (R): Score 1 for attacks that require physical access or post-authenticated unprivileged access, and 2 for remote attacks that can be conducted by unauthenticated users
  • Effectiveness (E): Score 1 if the control requires multiple configuration failures to be relevant, 2 if the control is a standard second-order defense for common misconfiguration, and 3 if the attack would succeed against standard configurations without the control in question
  • Administrative Impact (A): Score 1 if the administrative impact is insignificant or none, 2 if the control requires modifications to existing administrative practice, and 3 if the control would completely disable standard administrative practices in some way
  • Frequency of Impact (F): Score 1 if the administrative impact is to a non-standard process or arises less than once per month, 2 if the administrative impact is to a standard but infrequent process that occurs about once per month, and 3 if the impact is to a regular or frequent administrative practice

In the case where a single control can have different levels of impact in different scenarios, what turned out best for us (and avoided the most arguments) was to simply choose the highest justifiable value for each term, even if that value was not the most common or likely impact.

Applying the Metric

Let’s run the numbers on a couple of controls and see how this works out.  First we’ll try a “motherhood and apple pie” kind of control– disabling unencrypted administrative access like telnet:

  • Impact (I): Worst case scenario here is that an attacker hijacks an administrative session and gains control of the remote system.  So that’s administrative level access, meaing a score of 3 for this term.
  • Radius (R): Anybody on the network could potentially perform this attack, so this term is set to 2.
  • Effectiveness (E): Again you have to go with the maximal rating here, because the session hijacking threat is a standard “feature” of clear-text protocols– score 3.
  • Administrative Impact (A): Remember, we’re not discussing replacing clear-text administrative protocols with encrypted protocols at this point (justifying encrypted access is a separate conversation).  We’re discussing disabling unencrypted access, so the score here is 3 because we’re planning on completely disabling this administrative practice.
  • Frequency of Impact (F): If telnet is your regular router access scheme, then this change is going to impact you every day.  Again, the score is then 3.

So what’s the final calculation?  Easy: (3 * 2 * 3) – (3 * 3) = 9.  What’s that number mean?  Before I answer that question, let’s get another point of comparison by looking at a more controversial control.

We’ll try my own personal nemesis, the dreaded question of whether the system should respond to echo broadcast packets:

  • Impact (I): Worst case scenario here ends up being a denial of service attack (e.g. “smurf” type attack), so score 1.
  • Radius (R): Depends on whether or not your gateways are configured to pass directed broadcast traffic (hint: they shouldn’t be), but let’s assume the worst case and score this one a 2.
  • Effectiveness (E): Again, being as pessimistic as possible, let’s assume no other compensating controls in the environment and score this one a 3.
  • Administrative Impact (A): The broadcast ping supporters claim that disabling broadcast pings makes it more difficult to assess claimed IP addresses on a network and capture MAC addresses from systems (the so-called “ARP shotgun” approach).  Work-arounds are available, however, so let’s score this one a 2.
  • Frequency of Impact (F): In this case, we have what essentially becomes a site-specific answer.  But let’s assume that your network admins use broadcast pings regularly and score this one a 3.

So the final answer for disabling broadcast pings is: (1 * 2 * 3) – (2 * 3) = 0.  You could quibble about some of the terms, but I doubt you’re going to be able to make a case for this one scoring any higher than a 2 or so.

Interpreting the Scores

Once we followed this process and produced scores for all of the various controls in our document, a dominant pattern emerged.  The controls that everybody agreed with had scores of 3 or better.  The obviously ineffective controls were scoring 0 or less.  That left items with scores in the 1-2 range as being “on the bubble”, and indeed many of these items were generating our most enduring arguments.

What was also clear was that it wasn’t worth arguing about the items that only came in at 1 or 2.  Most of these ended up being “second-order” type controls for issues that could be mitigated in other ways much more effectively and with much less operational impact.  So we made an organizational decision to simply ignore any items that failed to score at least 3.

As far as arguments about the weighting of the individual terms, these tended to be few and far between.  Part of this was our adoption of a “when in doubt, use the maximum justifiable value” stance, and part of it was due to choosing a simple weighting scheme that didn’t leave much room for debate.  Also, once you start plugging the numbers in, it’s obvious that arguing over a 1 point change in a single term isn’t usually enough to counteract the other factors enough to get a given control to reach the overall qualifying score of 3.

Further Conclusions

What was also interesting about this process is that it gave us an objective measure for challenging the “conventional wisdom” about various security controls.  It’s one thing to say, “We should always do control X”, and quite another to have to plug numbers for the various terms related to “control X” into a spreadsheet.  It quickly becomes obvious when a control has minimal security impact in the real world.

This metric also channelled our discussion into much more productive and much less emotional avenues.  Even the relatively coarse granularity of our instrument was sufficient to break our “squishy” matters of personal opinion into discrete, measurable chunks.  And once you get engineers talking numbers, you know a solution is going to emerge eventually.

So when your organization finds itself in endless, time-wasting discussions regarding operational controls, try applying Chris’ little metric and see if you don’t rapidly approach something resembling clarity.  Your peers will thank you for injecting a little sanity into the proceedings.

Chris Calbrese passed away a little more than a year ago from a sudden and massive heat attack, leaving behind a wife and children.  His insight and quiet leadership are missed by all who knew him.  While Chris developed this metric in concert with his co-workers and later with the input of the participants in the Center for Internet Security’s consenus process, I have chosen to name the metric “Calabrese’s Razor” in his memory.

4 thoughts on “Calabrese’s Razor”

  1. The following are comments from one of my colleagues to whom I’ve sent your Website’s URL. The context in which thie comments were jotted down is that of the Government of Canada. I would be interested in your response. – Anton
    Interesting concept… From a quick read, it seems that this “metric” is not a performance measurement metric, but an analytical tool to help in the selection of controls. A numerical value is assigned to a security control based on its perceived “bang for the buck”, including considerations for negative impact of deploying the control (i.e. productivity costs).

    In some applications, it could be a useful analytical framework to help the development of a control framework that addresses risks identified through a risk assessment. However it seems that some of the factors being considered in the metric were actually factored in the risk assessment (i.e. attack impact, attack radius) and would already influence the selection of safeguards to address specific risks. I also see other limits to its application: controls rarely work in isolation from each other and the metric does not factor inter-dependencies. When assigning the Impact measure (the way it is currently defined), one would have to assume that other controls are functioning properly. For the Effectiveness measure, if a control is secondary to another control, I wonder what makes the other control the primary control (i.e. if two controls provide defence in depth, which is the primary and which is the secondary control)? I think that to have any usefulness, the metric would have to be further refined. Not all factors that should be considered in the selection of a safeguard are being measured. The actual numerical assignments would also have to be more generic.

    At first glance and as currently defined, I think that this metric might be useful when examining controls with a very narrow scope (e.g. security configuration of a particular operating system) but likely less useful to look at a broad range of technical, operational and management control designed to address system- or Enterprise-level security risks. The broader the scope of a risk assessment, the harder it would be to apply this kind of metric (and the less useful it would be). For the typical departments & agencies, I suspect that this level of formalization would not be beneficial. I would go with a simpler cost / benefit approach to assign to each control a relative risk reduction (effectiveness) and cost (including procurement, operating, maintenance and productivity costs). When I say “relative”, I mean 1,2,3 or High/Med/Low (vs trying to actually measure ROI). This would help selection and coarse prioritization of controls for implementation. This analysis is done informally by practitioners when designing a control framework. Explicitly assigning values to these metrics would make the rationale for control selection and prioritization more visible to the decision-makers.

  2. Anton, your colleagues’ comments are well-taken. We developed this metric as a tactical solution to channel our discussions at CIS into a more productive framework. But, as your colleagues point out, we were focused on a relatively small set of controls in a narrow domain. Plus we were primarily interested in understanding relative value between the different controls, not assigning a hard and fast number that would represent the “absolute” value of a particular control.

    It’s clear to me that this metric doesn’t replace a more comprehensive risk assessment framework that’s necessary for setting Information Security strategy. However, I am interested in your colleagues’ notion of trying to frame the results of such a risk assessment into a similar, simple set of parameters for rolling up the results to present to management. This could be fruitful.

    In case your colleagues weren’t already aware, there are other techniques that attempt to provide a risk assessment “dashboard” for management. For example, the STAR program that fellow SANS Instructor Randy Marchany developed for Virginia Tech University provides “traffic light” type markers (green, yellow, red) for understanding and prioritizing business risks.

  3. Hal,

    This is interesting work. The biggest concern I see with the analysis is that it assumes that every control is 100% effective. To go back to your cleartext telnet example, it’s obvious that the attack must be mitigated because the threat is clear, but there is no value given to *which* control to apply. Suppose I propose that we replace telnet with a new version that does an XOR with a static value. The data stream is now encrypted (poorly), but the attack is not really mitigated. It would be useful to extend this technique so that it helps decision makers evaluate different mitigations to a threat.

  4. I agree that having a tool to help choose between different compensating controls would be useful. In a sense the (E)ffectiveness term encapsulates some of what you’re looking for, though discriminating between various controls for the same issue was not a problem we were trying to address at the time.

Comments are closed.