Calabrese’s Razor

February 26, 2009

Hal Pomeranz, Deer Run Associates

I’ve long held the opinion that the community of “Information Security Experts” agree with each other 90% of the time, but waste 90% of their time arguing to the death with other InfoSec Experts about the remaining 10%.  This was painfully brought home to me several years ago as I was facilitating the consensus process around the Solaris Security document published by the Center for Internet Security.  You won’t believe the amount of time we spent arguing about seemingly trivial things like, “Should the system respond to echo broadcast?”  And as the consensus circle widened, we ended up wasting more time on these issues and repeating debates over and over again as new people joined the discussion.  In short, it was killing us.  People were burning out and failing to provide constructive feedback and we were failing to deliver updates in a timely fashion.

I see these kind of debates causing similar mayhem in the IT Ops and InfoSec groups at many organizations.  The problem is that in these cases the organizations are not simply debating the content of a document full of security recommendations, they’re arguing about matters of operational policy.  This seems to promote even more irrational passions, and also raises the stakes for failing to come to consensus and actually move forward.

At the low point of our crisis at the Center for Internet Security, the person who was most responsible for finding the solution was Chris Calabrese, who was facilitating the HP-UX benchmark for the Center. At roughly the same time as our issues at the Center, the IT Ops and InfoSec teams at Chris’ employer had gotten bogged down over similar kinds of issues and had decided to come up with an objective metric for deciding which information security controls were important and which ones were just not worth arguing about.  Suddenly the discussion of these issues was transformed from matters of opinion to matters of fact.  Consensus arrived quickly and nobody’s feelings got hurt.

Overview of the Metric

So we decided to adapt the metric that Chris had used to our work at the Center.  After some discussion, we decided that the metric had to account for two major factors: how important the security control was and how much negative operational impact the security control would impose.  Each of the two primary factors was made up of other components.

For example, the factors relating to the relative importance of a security control include:

  • Impact (I): Is the attack just a denial-of-service condition, or does it allow the attacker to actually gain access to the system? Does the attack allow privileged access?
  • Radius (R): Does the attack require local access or can it be conducted in an unauthenticated fashion over the network?
  • Effectiveness (E): Does the attack work against the system’s standard configuration, or is the control in question merely a backup in case of common misconfiguration, or even just a “defense in depth” measure that only comes into play after the failure of multiple controls?

Similarly, the administrative impact of a control was assessed based on two factors:

  • Administrative Impact (A): Would the change require significant changes to current administrative practice?
  • Frequency of Impact (F): How regularly would this impact be felt by the Operations teams?

The equation for deciding which controls were important simply evolved to: “(I * R * E) – (A* F)”.  In other words, multiply the terms related to the importance of the control to establish a positive value and then subtract the costs due to the administrative impact of the control.

The only thing missing was the actual numbers.  It turns out a very simple weighting scheme is sufficient:

  • Impact (I): Score 1 if attack is a denial-of-service, 2 if the attack allows unprivileged access, and 3 if the attack allows administrative access (or access to an admin-equivalent account like “oracle”, etc)
  • Radius (R): Score 1 for attacks that require physical access or post-authenticated unprivileged access, and 2 for remote attacks that can be conducted by unauthenticated users
  • Effectiveness (E): Score 1 if the control requires multiple configuration failures to be relevant, 2 if the control is a standard second-order defense for common misconfiguration, and 3 if the attack would succeed against standard configurations without the control in question
  • Administrative Impact (A): Score 1 if the administrative impact is insignificant or none, 2 if the control requires modifications to existing administrative practice, and 3 if the control would completely disable standard administrative practices in some way
  • Frequency of Impact (F): Score 1 if the administrative impact is to a non-standard process or arises less than once per month, 2 if the administrative impact is to a standard but infrequent process that occurs about once per month, and 3 if the impact is to a regular or frequent administrative practice

In the case where a single control can have different levels of impact in different scenarios, what turned out best for us (and avoided the most arguments) was to simply choose the highest justifiable value for each term, even if that value was not the most common or likely impact.

Applying the Metric

Let’s run the numbers on a couple of controls and see how this works out.  First we’ll try a “motherhood and apple pie” kind of control– disabling unencrypted administrative access like telnet:

  • Impact (I): Worst case scenario here is that an attacker hijacks an administrative session and gains control of the remote system.  So that’s administrative level access, meaing a score of 3 for this term.
  • Radius (R): Anybody on the network could potentially perform this attack, so this term is set to 2.
  • Effectiveness (E): Again you have to go with the maximal rating here, because the session hijacking threat is a standard “feature” of clear-text protocols– score 3.
  • Administrative Impact (A): Remember, we’re not discussing replacing clear-text administrative protocols with encrypted protocols at this point (justifying encrypted access is a separate conversation).  We’re discussing disabling unencrypted access, so the score here is 3 because we’re planning on completely disabling this administrative practice.
  • Frequency of Impact (F): If telnet is your regular router access scheme, then this change is going to impact you every day.  Again, the score is then 3.

So what’s the final calculation?  Easy: (3 * 2 * 3) – (3 * 3) = 9.  What’s that number mean?  Before I answer that question, let’s get another point of comparison by looking at a more controversial control.

We’ll try my own personal nemesis, the dreaded question of whether the system should respond to echo broadcast packets:

  • Impact (I): Worst case scenario here ends up being a denial of service attack (e.g. “smurf” type attack), so score 1.
  • Radius (R): Depends on whether or not your gateways are configured to pass directed broadcast traffic (hint: they shouldn’t be), but let’s assume the worst case and score this one a 2.
  • Effectiveness (E): Again, being as pessimistic as possible, let’s assume no other compensating controls in the environment and score this one a 3.
  • Administrative Impact (A): The broadcast ping supporters claim that disabling broadcast pings makes it more difficult to assess claimed IP addresses on a network and capture MAC addresses from systems (the so-called “ARP shotgun” approach).  Work-arounds are available, however, so let’s score this one a 2.
  • Frequency of Impact (F): In this case, we have what essentially becomes a site-specific answer.  But let’s assume that your network admins use broadcast pings regularly and score this one a 3.

So the final answer for disabling broadcast pings is: (1 * 2 * 3) – (2 * 3) = 0.  You could quibble about some of the terms, but I doubt you’re going to be able to make a case for this one scoring any higher than a 2 or so.

Interpreting the Scores

Once we followed this process and produced scores for all of the various controls in our document, a dominant pattern emerged.  The controls that everybody agreed with had scores of 3 or better.  The obviously ineffective controls were scoring 0 or less.  That left items with scores in the 1-2 range as being “on the bubble”, and indeed many of these items were generating our most enduring arguments.

What was also clear was that it wasn’t worth arguing about the items that only came in at 1 or 2.  Most of these ended up being “second-order” type controls for issues that could be mitigated in other ways much more effectively and with much less operational impact.  So we made an organizational decision to simply ignore any items that failed to score at least 3.

As far as arguments about the weighting of the individual terms, these tended to be few and far between.  Part of this was our adoption of a “when in doubt, use the maximum justifiable value” stance, and part of it was due to choosing a simple weighting scheme that didn’t leave much room for debate.  Also, once you start plugging the numbers in, it’s obvious that arguing over a 1 point change in a single term isn’t usually enough to counteract the other factors enough to get a given control to reach the overall qualifying score of 3.

Further Conclusions

What was also interesting about this process is that it gave us an objective measure for challenging the “conventional wisdom” about various security controls.  It’s one thing to say, “We should always do control X”, and quite another to have to plug numbers for the various terms related to “control X” into a spreadsheet.  It quickly becomes obvious when a control has minimal security impact in the real world.

This metric also channelled our discussion into much more productive and much less emotional avenues.  Even the relatively coarse granularity of our instrument was sufficient to break our “squishy” matters of personal opinion into discrete, measurable chunks.  And once you get engineers talking numbers, you know a solution is going to emerge eventually.

So when your organization finds itself in endless, time-wasting discussions regarding operational controls, try applying Chris’ little metric and see if you don’t rapidly approach something resembling clarity.  Your peers will thank you for injecting a little sanity into the proceedings.

Chris Calbrese passed away a little more than a year ago from a sudden and massive heat attack, leaving behind a wife and children.  His insight and quiet leadership are missed by all who knew him.  While Chris developed this metric in concert with his co-workers and later with the input of the participants in the Center for Internet Security’s consenus process, I have chosen to name the metric “Calabrese’s Razor” in his memory.