Skunkworks in the Clouds

April 23, 2009

Hal Pomeranz, Deer Run Associates

Skunks... clouds... get it?

I was recently asked to make a guest appearance on a podcast related to information security in “the cloud”.  One of the participants brought up an interesting anecdote from one of his clients.  Apparently the IT group at this company had been approached by a member of their marketing team who was looking for some compute resources to tackle a big data crunching exercise.  The IT group responded that they were already overloaded and it would be months before they could get around to providing the necessary infrastructure.  Rebuffed but undeterred, the marketing person used their credit card to purchase sufficient resources from Amazon’s EC2 to process the data set and got the work done literally overnight for a capital cost of approximately $1800.

There ensued the predictable horrified gasping from us InfoSec types on the podcast.  Nothing is more terrifying than skunkworks IT, especially on infrastructure not under our direct control.  “Didn’t they realize how insecure it was to do that?” “What will happen when all of our users realize how easily and conveniently they can do this?” “How can an organization control this type of risky behavior?” We went to bed immersed in our own paranoid but comfortable world-view.

Since then, however, I’ve had the chance to talk with other people about this situation.  In particular, my friend John Sechrest delivered an intellectual “boot to the head” that’s caused me to consider the situation in a new light.  Apparently getting the data processed in a timely fashion was so critical to the marketing department that they figured out their own self-service plan for obtaining the IT resources they needed. If the project was that critical, John asked, was it reasonable from a business perspective for the IT group to effectively refuse to help their marketing department crunch this data?

Maybe the IT group really was overloaded– most of them are these days.  However, the business of the company still needs to move forward, and the clever problem-solving monkeys in various parts of the organization will figure out ways to get their jobs done even without IT support. “Didn’t they realize how insecure it was to do that?”  No, and they didn’t care.  They needed to accomplish a goal, and they did.

“What will happen when all of our users realize how easily and conveniently they can do this?” My guess is they’re going to start doing it a lot more.  Maybe that’s a good thing.  If the IT group is really overloaded, then perhaps it should think about actually empowering their users to do these kind of “one off” or prototype projects on their own without draining the resources of the core IT group.  Remember that if you let a thousand IT projects bloom, 999 of them are going to wither and die shortly thereafter.  Perhaps IT doesn’t need to waste time managing the death of the 999.

“How can an organization control this type of risky behavior?” You probably can’t.  So perhaps your IT group should provide a secure offering that’s so compelling that your users will want to use your version rather than the commodity offerings that are so readily available.  This solution will have to be tailored to each company, but I think it starts with things like:

  • Pre-configured images with known baseline configurations and relevant tools so that groups can get up an running quickly without having to build and upload their own images.
  • Easy toolkits for migrating data and out of these images in a secure fashion, with some sort of DLP solution baked in.
  • Secure back-end storage to protect the data at rest in these images with no extra work on the part of the users.
  • Integration with the organization’s existing identity management and/or AAA framework so that users don’t have to re-implement their own solutions.
  • Integration with the organization’s auditing and logging infrastructures so you know what’s going on.

Putting together the kind of framework described above is a major IT project, and will require input and participation from your user community.  But once accomplished, it could provide massive leverage to overtaxed IT organizations.  Rather than IT having to engineer everything themselves, they provide secure self-service building blocks to their customers and let them have at it.

Providing architecture support and guidance in the early stages of each project is probably prudent.  After all, the one hardy little flower that blooms and refuses to die may become a critical resource to the organization that may eventually need to be moved back “in house”.  While the fact that the building blocks that were used to create the service are already well-integrated with the organization’s centralized IT infrastructure will help, having a reasonable architectural design from the start will also be a huge help when it comes time to migrate and continue scaling the service.

Am I advocating skunkworks IT?  No, I like to think I’m advocating self-service IT on a grand scale.  You’ll see what skunkworks IT looks like if you ignore this issue and just let your users develop their own solutions because you’re too busy to help them.

Hal Pomeranz, Deer Run Associates

Recently my pal Bill Schell and I were gassing on about the current and future state of IT employment, and he brought up the topic of IT jobs being “lost to the Cloud”.  In other words, if we’re to believe in the marketing hype of the Cloud Computing revolution, a great deal of processing is going to move out of the direct control of the individual organizations where it is currently being done.  One would expect IT jobs within those organizations that had previously been supporting that processing to disappear, or at least migrate over to the providers of the Cloud Computing resources.

I commented that the whole Cloud Computing story felt just like another turn in the epic cycle between centralized and decentralized computing.  He and I had both lived through the end of the mainframe era, into “Open Systems” on user desktops, back into centralized computing with X terminals and other “thin clients”, back out onto the desktops again with the rise of extremely powerful, extremely low cost commodity hardware, and now we’re harnessing that commodity hardware into giant centralized clusters that we’re calling “Clouds”.  It’s amazingly painful for the people whose jobs and lives are dislocated by these geologic shifts in computing practice, but the wheel keeps turning.

Bill brought up an economic argument for centralized computing that seems to crop up every time we’re heading back into the shift towards centralized computing.  Essentially the argument is summarized as follows:

  • As the capital cost of computing power declines, support costs tend to predominate.
  • Centralized support costs less then decentralized support.
  • Therefore centralized computing models will ultimately win out.

If you believe this argument, by now we should have all embraced a centralized computing model.  Yet instead we’ve seen this cycle between centralized and decentralized computing.  What’s driving the cycle?  It seems to me that there are other factors that work in opposition and keep the wheel turning.

First, it’s generally been a truism that centralized computing power costs more than decentralized computing.  In other words, it’s more expensive to hook 64 processors and 128GB of RAM onto the same backplane than it is to purchase 64 uniprocessor machines each with 2GB of RAM.  The Cloud Computing enthusiasts are promising to crack that problem by “loosely coupling” racks of inexpensive machines into a massive computing array. Though when “loose” is defined as Infiniband switch fabrics and the like, you’ll forgive me if I suspect they may be playing a little Three Card Monte with the numbers on the cost spreadsheets.  The other issue to point out here is that if your “centralized” computing model is really just a rack of “decentralized” servers, you’re giving up some of the savings in support costs that the centralized computing model was supposed to provide.

Another issue that rises to the fore when you move to a centralized computing model is the cost to the organization to maintain their access to the centralized computing resource.  One obvious cost area is basic “plumbing” like network access– how much is it going to cost you to get all the bandwidth you need (in both directions) at appropriately low latency?  Similarly, when your compute power is decentralized it’s easier to hide environmental costs like power and cooling, as opposed to when all of those machines are racked up together in the same room.  However, a less obvious cost is the cost of keeping the centralized computing resource up and available all the time, because now with all of your “eggs in one basket” as it were your entire business can be impacted by the same outage.  “Five-nines” uptime is really, really expensive.  Back when your eggs were spread out across multiple baskets, you didn’t necessarily care as much about the uptime of any single basket and the aggregate cost of keeping all the baskets available when needed was lower.

The centralized vs. decentralized cycle keeps turning because in any given computing epoch the costs of all of the above factors rise and fall.  This leads IT folks to optimize one factor over another, which promotes shifts in computing strategy, and the wheel turns again.

Despite what the marketeers would have you believe, I don’t think the Cloud Computing model has proven itself to the point where there is a massive impact on the way mainstream business is doing IT.  This may happen, but then again it may not.  The IT job loss we’re seeing now has a lot more to do with the general problems in the world-wide economy than jobs being “lost to the Cloud”.  But it’s worth remembering that massive changes in computing practice do happen on a regular basis, and IT workers need to be able to read the cycles and position themselves appropriately in the job market.

Hal Pomeranz, Deer Run Associates

Ed Skoudis and Mike Poor were kind enough to invite me to sit in on their recent SANS webcast round-table about emerging security threats.  During the webcast I was discussing some emerging attack trends against the Linux kernel, which I thought I would also jot down here for those of you who don’t have time to sit down and listen to the webcast recording.

Over the last several months, I’ve been observing a noticable uptick in the number of denial-of-service (DoS) conditions reported in the Linux kernel.  What that says to me is that there are groups out there who are scrutinizing the Linux kernel source code looking for vulnerabilities.  Frankly, I doubt they’re after DoS attacks– it’s much more interesting to find an exploit that gives you control of computing resources rather than one that lets you take them away from other people.

Usually when people go looking for vulnerabilities in an OS kernel they’re looking for privilege escalation attacks.  The kernel is often the easiest way to get elevated priviliges on the system.  Indeed, in the past few weeks there have been a couple [1] [2] of fixes for local privilege escalation vulnerabilities checked into the Linux kernel code.  So not only are these types of vulnerabilities being sought after, they’re being found (and probably used).

Now “local privilege escalation” means that the attacker has already found their way into the system as an unprivileged user.  Which begs the question, how are the attackers achieving their first goal of unprivileged access?  Well certainly there are enough insecure web apps running on Linux systems for attackers to have a field day.  But as I was pondering possible attack vectors, I had an uglier thought.

A lot of the public Cloud Computing providers make virtualized Linux images available to their customers.  The Cloud providers have to allow essentially unlimited open access to their services to anybody who wants it– this is, after all, their entire business model.  So in this scenario, the attacker doesn’t need an exploit to get unprivileged access to a Unix system: they get it as part of the Terms of Service.

What worries me is attackers that pair their local privilege escalation exploits with some sort of “virtualization escape” exploit, allowing them hypervisor level access to the Cloud provider’s infrastructure.  That’s a nightmare scenario, because now the attacker potentially has access to other customers’ jobs running in that computing infrastructure in a way that will likely be largely undetectable by those customers.

Now please don’t mistake me.  As far as we know, this scenario has not occurred.  Furthermore, I’m willing to believe that the Cloud providers supply generally higher levels of security than many of their customers could do on their own (the Cloud providers having the resources to get the “pick of the litter” when it comes to security expertise).  At the same time, the scenario I paint above has got to be an attractive one for attackers, and it’s possible we’re seeing the precursor traces of an effort to mount such an attack in the future.

So to all of you playing around in the Clouds I say, “Watch the skies!”