May 17, 2010
I read an article this morning about on-line banking fraud that was so awful it prompted me to dust off the Righteous IT blog and write about it. Sure, it’s a sponsored article from a financial industry site and not really journalism, so maybe I shouldn’t expect too much. But the problem is that the misinformation in this article– which is so typical of other articles related to on-line banking fraud– is actually hampering our ability to make the situation better.
Let’s start with the “money quote” in the article from F-Secure’s Sean Sullivan: “Last year there were more online bank robberies than there were actual on-site bank robberies.” You can be sure that this quote is going to get a lot of airplay on Twitter and in the popular press. I even understand what Sean is saying here– there were numerically more cases of on-line banking fraud than there were physical hold-ups at banking institutions. I might even believe this.
The problem with the quote is that it ignores one important point. When a real-world bank is held up, the bank’s insurance covers the cost of any losses. When a small business is the victim of on-line banking fraud, the bank is not legally obligated to make good the loss– a fact that is even noted later in the original article. The reality is that in on-line banking fraud, the bank is not the victim, their customers are. So while the quote is surely an attention-grabber, it ignores the critical fact that in the on-line world, the financial institutions have managed to transfer a whole lot of risk squarely into the laps of their customers.
The article goes on to extol the virtues of multi-factor authentication systems, including passwords, keys, security questions, personalized pictures, and so on. We even get another quote from Sean Sullivan: “The more layers you have before you get to your account, the safer you are.” Really? Then why does Sullivan also state in the same article, “Some more advanced types of Trojans can make fraudulent transfers and drain your account while you are logged on to the account online.”
The reality that banks may not want to admit right now is that readily available malware kits like Zeus are completely bypassing the bank’s on-line security protocols. This happens because the attacker has simply taken over the victim’s machine and is using the victim’s own credentials to conduct the fraudulent transactions. It doesn’t matter how many “layers” you have when the attackers own the victim’s system. To borrow Bruce Schneier’s phrase, all of those hoops that your on-line bank makes you jump through are not much more than “security theatre” at this point.
Finally there’s the standard wrap-up for an article of this type: the dreaded “How to Help Protect Your Account” list of bullet items. These lists always include advice on keeping your anti-virus/anti-spyware up-to-date and turning on auto-updates (it’s #2 in the list in this article). Well guess what? Perhaps more than half of the PCs infected with the Zeus banking malware had up-to-date virus signatures and patches.
And of course there’s the exhortation to “Use a strong password with letters and numbers combined.” How exactly is a strong password going to help you when the attackers learn what the password is as soon as you enter it into your web browser? Can we please stop suggesting that passwords– strong or otherwise– are going to help here?
At this point, I’m not sure there’s a way for normal users to achieve a reasonable level of security for on-line banking. The “attack surface” of a typical home computer is so vast that attackers will find a way to compromise the system. The best suggestion I’ve heard floated to date– using a dedicated computer for on-line banking— seems too expensive to be reasonable for home users, or even a typical small business (to say nothing of the inconvenience factor).
The bottom line is that current on-line security measures are not stopping thieves. We need to stop publishing articles that suggest that there is some magic litany of security steps an average user can take to make their on-line banking secure. If users were to abandon on-line banking– which is a huge money-saver for financial institutions compared to bricks and mortar branches with live tellers– you can bet that the banks might actually start working on some more effective security measures.
Similarly, as long as the banks can keep pushing their liability onto their customers, they have no incentive to fix the problem. We need more customers who are willing to go after their banks to recover their lost funds. Small business groups should agitate for the same sorts of protections that are afforded to individual accounts. By pushing the liability back onto the financial institutions, we make it more likely that the banks will actually spend their own money beefing up their on-line security measures and back-end fraud detection.
April 23, 2009
I was recently asked to make a guest appearance on a podcast related to information security in “the cloud”. One of the participants brought up an interesting anecdote from one of his clients. Apparently the IT group at this company had been approached by a member of their marketing team who was looking for some compute resources to tackle a big data crunching exercise. The IT group responded that they were already overloaded and it would be months before they could get around to providing the necessary infrastructure. Rebuffed but undeterred, the marketing person used their credit card to purchase sufficient resources from Amazon’s EC2 to process the data set and got the work done literally overnight for a capital cost of approximately $1800.
There ensued the predictable horrified gasping from us InfoSec types on the podcast. Nothing is more terrifying than skunkworks IT, especially on infrastructure not under our direct control. “Didn’t they realize how insecure it was to do that?” “What will happen when all of our users realize how easily and conveniently they can do this?” “How can an organization control this type of risky behavior?” We went to bed immersed in our own paranoid but comfortable world-view.
Since then, however, I’ve had the chance to talk with other people about this situation. In particular, my friend John Sechrest delivered an intellectual “boot to the head” that’s caused me to consider the situation in a new light. Apparently getting the data processed in a timely fashion was so critical to the marketing department that they figured out their own self-service plan for obtaining the IT resources they needed. If the project was that critical, John asked, was it reasonable from a business perspective for the IT group to effectively refuse to help their marketing department crunch this data?
Maybe the IT group really was overloaded– most of them are these days. However, the business of the company still needs to move forward, and the clever problem-solving monkeys in various parts of the organization will figure out ways to get their jobs done even without IT support. “Didn’t they realize how insecure it was to do that?” No, and they didn’t care. They needed to accomplish a goal, and they did.
“What will happen when all of our users realize how easily and conveniently they can do this?” My guess is they’re going to start doing it a lot more. Maybe that’s a good thing. If the IT group is really overloaded, then perhaps it should think about actually empowering their users to do these kind of “one off” or prototype projects on their own without draining the resources of the core IT group. Remember that if you let a thousand IT projects bloom, 999 of them are going to wither and die shortly thereafter. Perhaps IT doesn’t need to waste time managing the death of the 999.
“How can an organization control this type of risky behavior?” You probably can’t. So perhaps your IT group should provide a secure offering that’s so compelling that your users will want to use your version rather than the commodity offerings that are so readily available. This solution will have to be tailored to each company, but I think it starts with things like:
- Pre-configured images with known baseline configurations and relevant tools so that groups can get up an running quickly without having to build and upload their own images.
- Easy toolkits for migrating data and out of these images in a secure fashion, with some sort of DLP solution baked in.
- Secure back-end storage to protect the data at rest in these images with no extra work on the part of the users.
- Integration with the organization’s existing identity management and/or AAA framework so that users don’t have to re-implement their own solutions.
- Integration with the organization’s auditing and logging infrastructures so you know what’s going on.
Putting together the kind of framework described above is a major IT project, and will require input and participation from your user community. But once accomplished, it could provide massive leverage to overtaxed IT organizations. Rather than IT having to engineer everything themselves, they provide secure self-service building blocks to their customers and let them have at it.
Providing architecture support and guidance in the early stages of each project is probably prudent. After all, the one hardy little flower that blooms and refuses to die may become a critical resource to the organization that may eventually need to be moved back “in house”. While the fact that the building blocks that were used to create the service are already well-integrated with the organization’s centralized IT infrastructure will help, having a reasonable architectural design from the start will also be a huge help when it comes time to migrate and continue scaling the service.
Am I advocating skunkworks IT? No, I like to think I’m advocating self-service IT on a grand scale. You’ll see what skunkworks IT looks like if you ignore this issue and just let your users develop their own solutions because you’re too busy to help them.
March 23, 2009
Some months ago, a fellow Information Security professional posted to one of the mailing lists I monitor, looking for security arguments to refute the latest skunkworks project from her sales department. Essentially, one of the sales folks had developed a thick client application that connected to an internal customer database. The plan was to equip all of the sales agents in the field with this application and allow them to connect directly back through the corporate firewall to the production copy of the database over an unencrypted link. This seemed like a terrible idea, and the poster was looking to marshal arguments against deploying this software.
The predictable discussion ensued, with everybody on the list enumerating the many reasons why this was a bad idea from an InfoSec perspective and in some cases suggesting work-arounds to spackle over deficiencies in the design of the system. My advice was simpler– refute the design on Engineering principles rather than InfoSec grounds. Specifically:
- The system had no provision for allowing the users to work off-line or when the corporate database was unavailable.
- While the system worked fine in the corporate LAN environment, bandwidth and latency issues over the Internet would probably render the application unusable.
Sure enough, when confronted with these reasonable engineering arguments, the project was scrapped as unworkable. The Information Security group didn’t need to waste any of their precious political capital shooting down this obviously bad idea.
This episode ties into a motto I’ve developed during my career: “Never sell security as security.” In general, Information Security only gets a limited number of trump cards they can play to control the architecture and deployment of all the IT-related projects in the pipeline. So anything they can do to create IT harmony and information security without exhausting their hand is a benefit.
It’s also useful to consider my motto when trying to get funding for Information Security related projects. It’s been my experience that many companies will only invest in Information Security a limited number of times: “We spent $35K on a new firewall to keep the nasty hackers at bay and that’s all you get.” To achieve the comprehensive security architecture you need to keep your organization safe, you need to get creative about aligning security procurement with other business initiatives.
For example, file integrity assessment tools like Tripwire have an obvious forensic benefit when a security incident occurs, but the up-front cost of acquiring, deploying, and using these tools just for the occasional forensic benefit often makes them a non-starter for organizations. However, if you change the game and point out that the primary ongoing benefit of these tools is as a control on your own change management processes, then they become something that the organization is willing to pay for. You’ll notice that the nice folks at Tripwire realized this long ago and sell their software as “Configuration Control”, not “Security”.
Sometimes you can get organizational support from even further afield. I once sold an organization on using sudo with the blessings of Human Resources because it streamlined their employee termination processes: nobody knew the root passwords, so the passwords didn’t need to get changed every time somebody from IT left the company. When we ran the numbers, this turned out to be a significant cost-savings for the company.
So be creative and don’t go into every project with your Information Security blinders on. There are lots of projects in the pipeline that may be bad ideas from an Information Security perspective, but it’s likely that they have other problems as well. You can use those problems as leverage to implement architectures that are more efficient and rational from an Engineering as well as from an Information Security perspective. Similarly there are critical business processes that the Information Security group can leverage to implement necessary security controls without necessarily spending Information Security’s capital (or political) budget.
March 2, 2009
At the end of our recent SANS webcast, Mike Poor closed by emphasizing how important it was for IT and Information Security groups to advertise their operational successes to the rest of the organization (and also to their own people). Too often these functions are seen as pure cost centers, and in these difficult economic times it’s up to these organizations to demonstrate return value or face severe cutbacks.
The question is what are the right metrics to publish in order to indicate success? All too often I see organizations publishing meaningless metrics, or even metrics that create negative cultures that damage corporate perception of the organization:
- It seems like a lot of IT Ops groups like to publish their “look how much stuff we operate” metrics: so many thousand machines, so many petabytes of disk, terabytes of backup data per week, etc. The biggest problem with these metrics is that they can be used to justify massive process inefficiencies. Maybe you have thousands of machines because every IT project buys its own hardware and you’re actually wasting money and resources that could be saved by consolidating. Besides, nobody else in the company cares how big your… er, server farm is.
- Then there are the dreaded help desk ticket metrics: tickets closed per week, average time to close tickets, percentage of open tickets, etc. The only thing these metrics do is incentivize your help desk to do a slapdash job and thereby annoy your customers. There’s only one help desk metric that matters: customer satisfaction. If you’re not doing customer satisfaction surveys on EVERY TICKET and/or you’re not getting good results then you fail.
So what are some good metrics? Well I’m a Visible Ops kind of guy, so the metrics that matter to me are things like amount of unplanned downtime (drive to zero), number of successful changes requiring no unplanned work or firefighting (more is better), number of unplanned or unauthorized changes (drive to zero), and projects completed on time and on-budget (more is better). Of course, if your IT organization is struggling, you might be tempted to NOT publish these metrics because they show that you’re not performing well. In these cases, accentuate the positive by publishing your improvement numbers rather than the raw data: “This month we had 33% less unplanned downtime than last month.” This makes your organization look proactive and creates the right cultural imperatives without airing your dirty laundry.
There are a couple of other places where I never fail to toot my own horn:
- If my organization makes life substantially better for another part of the company then you’d better believe I’m going to advertise that fact. For example, when my IT group put together a distributed build system that cut product compiles down from over eight hours to less than one hour, it not only went into our regular status roll-ups, but I also got the head of the Release Engineering group to give us some testimonials as well.
- Whenever a significant new security vulnerability comes out that is not an issue for us because of our standard builds and/or operations environment, I make sure the people who provide my budget know about it. It also helps if you can point to “horror story” articles about the amount of money other organizations have had to pay to clean up after incidents related to the vulnerability. This is one of the few times that Information Security can demonstrate direct value to the organization, and you must never miss out on these chances.
What’s That Smell?
If communicating your successes builds a corporate perception of your organization’s value, being transparent about your failures builds trust with the rest of the business. If you try to present a relentlessly positive marketing spin on your accomplishments your “customers” elsewhere in the company will become suspicious. Plus you’ll never bamboozle them sufficiently with your wins that they won’t notice the elephant in the room when you fall on your face.
The important things to communicate when you fail are that you understand what lead to the failure, that you have the situational awareness to understand the impact of the failure on the business, and the steps you’re taking to make sure that the same failure never happens again (the only real organizational failure is allowing the same failure to happen twice). Here’s a simple checklist of items you should have in your disclosure statement:
- Analysis of the process(es) that led to the failure
- The duration of the outage
- How the outage was detected
- The systems and services impacted
- Which business units were impacted and in what way
- Actions taken to end the outage
- Corrective processes to make sure it never happens again
Note that in some cases it’s necessary to split the disclosure across 2-3 messages. One is sent during the incident telling your constituents, “Yes, there’s a problem and we’re working it.” The next is the “services restored at time X, more information forthcoming” message. And then finally your complete post-mortem report. Try to avoid partial or incomplete disclosure or idle speculation without all of the facts– you’ll almost always end up with egg on your face.
If you don’t communicate what’s happening in your IT and/or InfoSec organization then the the other business units are basically going to assume you’re not doing anything during the time when you’re not directly working on their requests. This leads to the perception of IT as nothing more than “revenue sucking pigs“.
However, you also have to communicate in the right way. This means communicating worthwhile metrics and metrics which don’t create bad cultural imperatives for your organization. And it also means being transparent and communicating your failures– in the most proactive way possible– to the rest of the organization.
February 26, 2009
I’ve long held the opinion that the community of “Information Security Experts” agree with each other 90% of the time, but waste 90% of their time arguing to the death with other InfoSec Experts about the remaining 10%. This was painfully brought home to me several years ago as I was facilitating the consensus process around the Solaris Security document published by the Center for Internet Security. You won’t believe the amount of time we spent arguing about seemingly trivial things like, “Should the system respond to echo broadcast?” And as the consensus circle widened, we ended up wasting more time on these issues and repeating debates over and over again as new people joined the discussion. In short, it was killing us. People were burning out and failing to provide constructive feedback and we were failing to deliver updates in a timely fashion.
I see these kind of debates causing similar mayhem in the IT Ops and InfoSec groups at many organizations. The problem is that in these cases the organizations are not simply debating the content of a document full of security recommendations, they’re arguing about matters of operational policy. This seems to promote even more irrational passions, and also raises the stakes for failing to come to consensus and actually move forward.
At the low point of our crisis at the Center for Internet Security, the person who was most responsible for finding the solution was Chris Calabrese, who was facilitating the HP-UX benchmark for the Center. At roughly the same time as our issues at the Center, the IT Ops and InfoSec teams at Chris’ employer had gotten bogged down over similar kinds of issues and had decided to come up with an objective metric for deciding which information security controls were important and which ones were just not worth arguing about. Suddenly the discussion of these issues was transformed from matters of opinion to matters of fact. Consensus arrived quickly and nobody’s feelings got hurt.
Overview of the Metric
So we decided to adapt the metric that Chris had used to our work at the Center. After some discussion, we decided that the metric had to account for two major factors: how important the security control was and how much negative operational impact the security control would impose. Each of the two primary factors was made up of other components.
For example, the factors relating to the relative importance of a security control include:
- Impact (I): Is the attack just a denial-of-service condition, or does it allow the attacker to actually gain access to the system? Does the attack allow privileged access?
- Radius (R): Does the attack require local access or can it be conducted in an unauthenticated fashion over the network?
- Effectiveness (E): Does the attack work against the system’s standard configuration, or is the control in question merely a backup in case of common misconfiguration, or even just a “defense in depth” measure that only comes into play after the failure of multiple controls?
Similarly, the administrative impact of a control was assessed based on two factors:
- Administrative Impact (A): Would the change require significant changes to current administrative practice?
- Frequency of Impact (F): How regularly would this impact be felt by the Operations teams?
The equation for deciding which controls were important simply evolved to: “(I * R * E) – (A* F)”. In other words, multiply the terms related to the importance of the control to establish a positive value and then subtract the costs due to the administrative impact of the control.
The only thing missing was the actual numbers. It turns out a very simple weighting scheme is sufficient:
- Impact (I): Score 1 if attack is a denial-of-service, 2 if the attack allows unprivileged access, and 3 if the attack allows administrative access (or access to an admin-equivalent account like “oracle”, etc)
- Radius (R): Score 1 for attacks that require physical access or post-authenticated unprivileged access, and 2 for remote attacks that can be conducted by unauthenticated users
- Effectiveness (E): Score 1 if the control requires multiple configuration failures to be relevant, 2 if the control is a standard second-order defense for common misconfiguration, and 3 if the attack would succeed against standard configurations without the control in question
- Administrative Impact (A): Score 1 if the administrative impact is insignificant or none, 2 if the control requires modifications to existing administrative practice, and 3 if the control would completely disable standard administrative practices in some way
- Frequency of Impact (F): Score 1 if the administrative impact is to a non-standard process or arises less than once per month, 2 if the administrative impact is to a standard but infrequent process that occurs about once per month, and 3 if the impact is to a regular or frequent administrative practice
In the case where a single control can have different levels of impact in different scenarios, what turned out best for us (and avoided the most arguments) was to simply choose the highest justifiable value for each term, even if that value was not the most common or likely impact.
Applying the Metric
Let’s run the numbers on a couple of controls and see how this works out. First we’ll try a “motherhood and apple pie” kind of control– disabling unencrypted administrative access like telnet:
- Impact (I): Worst case scenario here is that an attacker hijacks an administrative session and gains control of the remote system. So that’s administrative level access, meaing a score of 3 for this term.
- Radius (R): Anybody on the network could potentially perform this attack, so this term is set to 2.
- Effectiveness (E): Again you have to go with the maximal rating here, because the session hijacking threat is a standard “feature” of clear-text protocols– score 3.
- Administrative Impact (A): Remember, we’re not discussing replacing clear-text administrative protocols with encrypted protocols at this point (justifying encrypted access is a separate conversation). We’re discussing disabling unencrypted access, so the score here is 3 because we’re planning on completely disabling this administrative practice.
- Frequency of Impact (F): If telnet is your regular router access scheme, then this change is going to impact you every day. Again, the score is then 3.
So what’s the final calculation? Easy: (3 * 2 * 3) – (3 * 3) = 9. What’s that number mean? Before I answer that question, let’s get another point of comparison by looking at a more controversial control.
We’ll try my own personal nemesis, the dreaded question of whether the system should respond to echo broadcast packets:
- Impact (I): Worst case scenario here ends up being a denial of service attack (e.g. “smurf” type attack), so score 1.
- Radius (R): Depends on whether or not your gateways are configured to pass directed broadcast traffic (hint: they shouldn’t be), but let’s assume the worst case and score this one a 2.
- Effectiveness (E): Again, being as pessimistic as possible, let’s assume no other compensating controls in the environment and score this one a 3.
- Administrative Impact (A): The broadcast ping supporters claim that disabling broadcast pings makes it more difficult to assess claimed IP addresses on a network and capture MAC addresses from systems (the so-called “ARP shotgun” approach). Work-arounds are available, however, so let’s score this one a 2.
- Frequency of Impact (F): In this case, we have what essentially becomes a site-specific answer. But let’s assume that your network admins use broadcast pings regularly and score this one a 3.
So the final answer for disabling broadcast pings is: (1 * 2 * 3) – (2 * 3) = 0. You could quibble about some of the terms, but I doubt you’re going to be able to make a case for this one scoring any higher than a 2 or so.
Interpreting the Scores
Once we followed this process and produced scores for all of the various controls in our document, a dominant pattern emerged. The controls that everybody agreed with had scores of 3 or better. The obviously ineffective controls were scoring 0 or less. That left items with scores in the 1-2 range as being “on the bubble”, and indeed many of these items were generating our most enduring arguments.
What was also clear was that it wasn’t worth arguing about the items that only came in at 1 or 2. Most of these ended up being “second-order” type controls for issues that could be mitigated in other ways much more effectively and with much less operational impact. So we made an organizational decision to simply ignore any items that failed to score at least 3.
As far as arguments about the weighting of the individual terms, these tended to be few and far between. Part of this was our adoption of a “when in doubt, use the maximum justifiable value” stance, and part of it was due to choosing a simple weighting scheme that didn’t leave much room for debate. Also, once you start plugging the numbers in, it’s obvious that arguing over a 1 point change in a single term isn’t usually enough to counteract the other factors enough to get a given control to reach the overall qualifying score of 3.
What was also interesting about this process is that it gave us an objective measure for challenging the “conventional wisdom” about various security controls. It’s one thing to say, “We should always do control X”, and quite another to have to plug numbers for the various terms related to “control X” into a spreadsheet. It quickly becomes obvious when a control has minimal security impact in the real world.
This metric also channelled our discussion into much more productive and much less emotional avenues. Even the relatively coarse granularity of our instrument was sufficient to break our “squishy” matters of personal opinion into discrete, measurable chunks. And once you get engineers talking numbers, you know a solution is going to emerge eventually.
So when your organization finds itself in endless, time-wasting discussions regarding operational controls, try applying Chris’ little metric and see if you don’t rapidly approach something resembling clarity. Your peers will thank you for injecting a little sanity into the proceedings.Chris Calbrese passed away a little more than a year ago from a sudden and massive heat attack, leaving behind a wife and children. His insight and quiet leadership are missed by all who knew him. While Chris developed this metric in concert with his co-workers and later with the input of the participants in the Center for Internet Security’s consenus process, I have chosen to name the metric “Calabrese’s Razor” in his memory.
February 17, 2009
Ed Skoudis and Mike Poor were kind enough to invite me to sit in on their recent SANS webcast round-table about emerging security threats. During the webcast I was discussing some emerging attack trends against the Linux kernel, which I thought I would also jot down here for those of you who don’t have time to sit down and listen to the webcast recording.
Over the last several months, I’ve been observing a noticable uptick in the number of denial-of-service (DoS) conditions reported in the Linux kernel. What that says to me is that there are groups out there who are scrutinizing the Linux kernel source code looking for vulnerabilities. Frankly, I doubt they’re after DoS attacks– it’s much more interesting to find an exploit that gives you control of computing resources rather than one that lets you take them away from other people.
Usually when people go looking for vulnerabilities in an OS kernel they’re looking for privilege escalation attacks. The kernel is often the easiest way to get elevated priviliges on the system. Indeed, in the past few weeks there have been a couple   of fixes for local privilege escalation vulnerabilities checked into the Linux kernel code. So not only are these types of vulnerabilities being sought after, they’re being found (and probably used).
Now “local privilege escalation” means that the attacker has already found their way into the system as an unprivileged user. Which begs the question, how are the attackers achieving their first goal of unprivileged access? Well certainly there are enough insecure web apps running on Linux systems for attackers to have a field day. But as I was pondering possible attack vectors, I had an uglier thought.
A lot of the public Cloud Computing providers make virtualized Linux images available to their customers. The Cloud providers have to allow essentially unlimited open access to their services to anybody who wants it– this is, after all, their entire business model. So in this scenario, the attacker doesn’t need an exploit to get unprivileged access to a Unix system: they get it as part of the Terms of Service.
What worries me is attackers that pair their local privilege escalation exploits with some sort of “virtualization escape” exploit, allowing them hypervisor level access to the Cloud provider’s infrastructure. That’s a nightmare scenario, because now the attacker potentially has access to other customers’ jobs running in that computing infrastructure in a way that will likely be largely undetectable by those customers.
Now please don’t mistake me. As far as we know, this scenario has not occurred. Furthermore, I’m willing to believe that the Cloud providers supply generally higher levels of security than many of their customers could do on their own (the Cloud providers having the resources to get the “pick of the litter” when it comes to security expertise). At the same time, the scenario I paint above has got to be an attractive one for attackers, and it’s possible we’re seeing the precursor traces of an effort to mount such an attack in the future.
So to all of you playing around in the Clouds I say, “Watch the skies!”