Hal Pomeranz, Deer Run Associates

Tell me what this is:

TCP Header in Lego(TM)

TCP Header in Lego(TM)

If you said, “Hey! That’s a TCP header diagram in Lego(TM)”, or perhaps, “Holy &^%@! That idiot made a TCP header diagram in Lego(TM)!”, then you’re exactly right!  This is another one of those wild, wacky ideas that we dreamed up in the middle of one of my SANS classes (note to the SANS staff: shorter breaks might be a good idea).  I bet my students never thought I’d actually do it.

Of course, you know I couldn’t stop with just doing the TCP header:

IP Header in Lego(TM)

IP Header in Lego(TM)

Now why am I wasting all that space on the building plate in each case?  Why so you can put them together of course:

TCP/IP in Lego(TM)

TCP/IP in Lego(TM)

The use of color here really highlights certain portions of the packet header.  For example, the source and destination addresses and ports really jump out.  But there are some other, more subtle color patterns that I worked in here.  For example, if you look closely you’ll see that I matched the color of the ACK bit with the blue in the ACK number field.  Similarly the colors of the SYN bit and the sequence number match, as do the URG bit and urgent pointer field.

Actually I wish I had a couple of more colors available.  Yes, Lego(TM) comes in dozens of colors these days, but they only make 2×8 blocks (aka one “Lego(TM) Byte”) in six colors: White, Black, Red, Yellow, Blue, and Beige.

Lego(TM) Byte, Nibble, and Bit

Lego(TM) Byte, Nibble, and Bit

So while I tried to use Beige exclusively for size fields, Red for reserved bits, Yellow for checksums, and so on, I ultimately ended up having to use these colors for other fields as well– for example, the yellow sequence number fields in the TCP header.  Maybe I should have just bought a bunch of “nibbles” (2×4 blocks) in other colors and not been so choosy about using full “Lego(TM) Bytes”.

Serious Fun

Cute idea, but is there any practical value?  After a lengthy conversation with my inner child (who is generally more mature than my outer persona), I realized that there was a fun learning game we could make out of all this.  So I labelled all the blocks.  Yes, that’s right. I. Labelled. Every. Single. Block. I even did the individual bits:

TCP Header "Bits" in Lego(TM). Labelled.

TCP Header "Bits" in Lego(TM). Labelled.

So the game becomes learning where all the fields are in the various packet headers so that you can re-create the packet diagrams from piles that look like this:

Lego(TM) blocks waiting to become TCP header diagram

Lego(TM) blocks waiting to become TCP header diagram

Now we can teach students how to decode packet headers by letting them play with Legos(TM).  And that means we can all write off our Lego(TM) collections as a business expense!  How cool is that?

Admit It.  You Can’t Wait To Do It Too!

Project Tools

Project Tools

If you’ve got a hankering to try this out for yourself, it doesn’t take a whole lot.  I way overbought on the Lego front: six green base-plates, and 20 2×8 “Lego(TM) Bytes”, 8 “Nibbles” (2×4 blocks), and 16 “Bits” in each color.  Total cost for the Lego(TM) was around US$100 delivered.

Labelling was accomplished with my P-Touch(TM) labeller.  3/8″ ribbon is precisely the right height to be placed on the side of a Lego(TM) block.  It also helps to have a razor blade type tool to help separate the P-Touch(TM) labels from their backing and apply them to the blocks.

And of course I have to give a shout-out to the late, great Richard Stevens and his biblical tome TCP/IP Illustrated: Vol 1.  If you don’t already own this book, buy it. Seriously.

Final Thoughts

Finally, to all of you who think I need a life, all I have to say is:

Labelled TTL "Lego(TM) Byte"

Labelled TTL "Lego(TM) Byte"

Baby, I’m living the dream!

Hal Pomeranz, Deer Run Associates

Not too long ago, I was bemoaning the fact that my mother still uses floppy disks and that it’s becoming harder and harder to get her set up with a computer that actually has a floppy drive.  Then one of my friends suggested a brilliant idea, which I’ve only just gotten around to implementing:

Floppy Disk w/ Low-Tech USB Adaptor

“Just tell her it’s an adaptor!”, my friend suggested.  Heck man, not only is it an adaptor, it also spectacularly increases the amount of stuff you can store on that floppy!  I wonder if my Mom will go for it?

Not News is Bad News

May 17, 2010

Hal Pomeranz, Deer Run Associates

I read an article this morning about on-line banking fraud that was so awful it prompted me to dust off the Righteous IT blog and write about it.  Sure, it’s a sponsored article from a financial industry site and not really journalism, so maybe I shouldn’t expect too much.  But the problem is that the misinformation in this article– which is so typical of other articles related to on-line banking fraud– is actually hampering our ability to make the situation better.

Let’s start with the “money quote” in the article from F-Secure’s Sean Sullivan: “Last year there were more online bank robberies than there were actual on-site bank robberies.”  You can be sure that this quote is going to get a lot of airplay on Twitter and in the popular press.  I even understand what Sean is saying here– there were numerically more cases of on-line banking fraud than there were physical hold-ups at banking institutions.  I might even believe this.

The problem with the quote is that it ignores one important point.  When a real-world bank is held up, the bank’s insurance covers the cost of any losses.  When a small business is the victim of on-line banking fraud, the bank is not legally obligated to make good the loss– a fact that is even noted later in the original article.  The reality is that in on-line banking fraud, the bank is not the victim, their customers are.  So while the quote is surely an attention-grabber, it ignores the critical fact that in the on-line world, the financial institutions have managed to transfer a whole lot of risk squarely into the laps of their customers.

The article goes on to extol the virtues of multi-factor authentication systems, including passwords, keys, security questions, personalized pictures, and so on.  We even get another quote from Sean Sullivan: “The more layers you have before you get to your account, the safer you are.”  Really? Then why does Sullivan also state in the same article, “Some more advanced types of Trojans can make fraudulent transfers and drain your account while you are logged on to the account online.”

The reality that banks may not want to admit right now is that readily available malware kits like Zeus are completely bypassing the bank’s on-line security protocols.  This happens because the attacker has simply taken over the victim’s machine and is using the victim’s own credentials to conduct the fraudulent transactions. It doesn’t matter how many “layers” you have when the attackers own the victim’s system.  To borrow Bruce Schneier’s phrase, all of those hoops that your on-line bank makes you jump through are not much more than “security theatre” at this point.

Finally there’s the standard wrap-up for an article of this type: the dreaded “How to Help Protect Your Account” list of bullet items.  These lists always include advice on keeping your anti-virus/anti-spyware up-to-date and turning on auto-updates (it’s #2 in the list in this article).  Well guess what?  Perhaps more than half of the PCs infected with the Zeus banking malware had up-to-date virus signatures and patches.

And of course there’s the exhortation to “Use a strong password with letters and numbers combined.”  How exactly is a strong password going to help you when the attackers learn what the password is as soon as you enter it into your web browser?  Can we please stop suggesting that passwords– strong or otherwise– are going to help here?

At this point, I’m not sure there’s a way for normal users to achieve a reasonable level of security for on-line banking.  The “attack surface” of a typical home computer is so vast that attackers will find a way to compromise the system.  The best suggestion I’ve heard floated to date– using a dedicated computer for on-line banking– seems too expensive to be reasonable for home users, or even a typical small business (to say nothing of the inconvenience factor).

The bottom line is that current on-line security measures are not stopping thieves. We need to stop publishing articles that suggest that there is some magic litany of security steps an average user can take to make their on-line banking secure.  If users were to abandon on-line banking– which is a huge money-saver for financial institutions compared to bricks and mortar branches with live tellers– you can bet that the banks might actually start working on some more effective security measures.

Similarly, as long as the banks can keep pushing their liability onto their customers, they have no incentive to fix the problem.  We need more customers who are willing to go after their banks to recover their lost funds.  Small business groups should agitate for the same sorts of protections that are afforded to individual accounts.  By pushing the liability back onto the financial institutions, we make it more likely that the banks will actually spend their own money beefing up their on-line security measures and back-end fraud detection.

Policy Matters

August 12, 2009

Hal Pomeranz, Deer Run Associates

Hey, I read a great policy today! I bet you can count on the fingers of one kneecap the number of times you’ve said that in your life.  Yet in this case it’s quite literally true: I did read a great policy today.  Surf on over to Will Weider’s blog and check out his organization’s draft Social Media Policy.

Let me try and articulate some of the reasons why I like this policy document so much:

  • It takes something that many organizations are viewing as a threat– the use of social media by their employees– and instead dares to envision an opportunity instead.
  • It’s inclusive, recognizing and supporting the role that all employees have in promoting the business.
  • It’s respectful to employees, treating them as adults with decision making capabilities rather than naughty children who need specific rules in order to be kept in line.
  • It discusses the problems that can occur with social media outside of the contexts that directly affect the reputation and bottom-line of the organization (in particular see “Don’t jeopardize your reputation and/or future employment opportunities” and “Don’t alienate your co-workers”).
  • It’s educational.  It’s readable.  It’s short.  And it clearly indicates who has authority and who to contact in case of questions about the policy.

I tell you, if more policies were written this way maybe “policy” wouldn’t be such a dirty word to people.

The other thing I love about Will’s posting is that he’s clearly demonstrating the opportunity of Social Media by posting a draft of the policy on the web for comments.  I challenge you to read the draft policy and not think, “Hey, this company seems to have their head screwed on straight.  This might be a good place to work.”  Oh yeah, and he might get some comments directly about the policy that might help improve the document.  “Open Source” policy, anyone?

So kudos to Will and the team at Ministry Health for an excellent piece of work.  By the way, all of you IT folks reading this blog should consider following Will (@CandidCIO) on Twitter.

Hal Pomeranz, Deer Run Associates

Today I received a rather alarmed email from one of my customers who’s on the faculty at a large research university here in the US.  Apparently, email originating from the server I’m maintaining for this customer is being bounced by the mail servers at the educational institutions that are users of our software.

Examining the bounce messages, I find that they’re originating from anti-spam appliances sold by Barracuda Networks, Inc.  Each bounce message contains a URL pointing you to an explanatory web page, which indicates that the messages are being bounced because the outgoing email servers for the Engineering department at this large university have been listed in Barracuda’s “bad reputation” blacklist.  There is a laundry list of reasons cited as to why these mail servers may have been listed, but no clear indication of the actual offense that caused these specific servers to be listed.

However, there is this little highlighted tidbit on the web page:

One way to get your email through spam filters even if you are listed on the BRBL is to register your domain and IPs at EmailReg.org. Email administrators can configure their systems to use EmailReg.org to apply policy to inbound email. Emails from domain names and IP addresses that are properly registered on EmailReg.org can be automatically exempted from spam filtering defense layers on Barracuda Spam Firewalls, preventing your email from being accidentally blocked.

Surfing on over to EmailReg.org I discover that getting your server address “properly registered” requires a $20 “administrative charge”– apparently per server.  Furthermore, it seems that EmailReg.org is at least receiving hosting equipment from Barracuda Networks.  There is little other information to be found regarding who exactly is behind EmailReg.org.

But let me tell you what it smells like to me– it smells like a “protection racket” being run by Barracuda Networks.  They can add arbitrary senders to their “bad reputation” blacklist and then prominently advertise the services of EmailReg.org as a mechanism for being removed from the blacklist.  Judging by the number of bounce messages my client is receiving, being blacklisted by Barracuda devices cuts you off from sending email to a significant number of organizations.  Many companies, even legitimate senders, will likely pay the $20 just to avoid the hassle.  If, as I suspect, Barracuda Networks is receiving some commercial gain from EmailReg.org, then this is conduct of the lowest order.

I have filed a complaint with the US Federal Trade Commission, asking them to investigate this matter.  I urge everybody who has had similar experiences to file similar complaints with the appropriate organization for your jurisdiction.

Skunkworks in the Clouds

April 23, 2009

Hal Pomeranz, Deer Run Associates

Skunks... clouds... get it?

I was recently asked to make a guest appearance on a podcast related to information security in “the cloud”.  One of the participants brought up an interesting anecdote from one of his clients.  Apparently the IT group at this company had been approached by a member of their marketing team who was looking for some compute resources to tackle a big data crunching exercise.  The IT group responded that they were already overloaded and it would be months before they could get around to providing the necessary infrastructure.  Rebuffed but undeterred, the marketing person used their credit card to purchase sufficient resources from Amazon’s EC2 to process the data set and got the work done literally overnight for a capital cost of approximately $1800.

There ensued the predictable horrified gasping from us InfoSec types on the podcast.  Nothing is more terrifying than skunkworks IT, especially on infrastructure not under our direct control.  “Didn’t they realize how insecure it was to do that?” “What will happen when all of our users realize how easily and conveniently they can do this?” “How can an organization control this type of risky behavior?” We went to bed immersed in our own paranoid but comfortable world-view.

Since then, however, I’ve had the chance to talk with other people about this situation.  In particular, my friend John Sechrest delivered an intellectual “boot to the head” that’s caused me to consider the situation in a new light.  Apparently getting the data processed in a timely fashion was so critical to the marketing department that they figured out their own self-service plan for obtaining the IT resources they needed. If the project was that critical, John asked, was it reasonable from a business perspective for the IT group to effectively refuse to help their marketing department crunch this data?

Maybe the IT group really was overloaded– most of them are these days.  However, the business of the company still needs to move forward, and the clever problem-solving monkeys in various parts of the organization will figure out ways to get their jobs done even without IT support. “Didn’t they realize how insecure it was to do that?”  No, and they didn’t care.  They needed to accomplish a goal, and they did.

“What will happen when all of our users realize how easily and conveniently they can do this?” My guess is they’re going to start doing it a lot more.  Maybe that’s a good thing.  If the IT group is really overloaded, then perhaps it should think about actually empowering their users to do these kind of “one off” or prototype projects on their own without draining the resources of the core IT group.  Remember that if you let a thousand IT projects bloom, 999 of them are going to wither and die shortly thereafter.  Perhaps IT doesn’t need to waste time managing the death of the 999.

“How can an organization control this type of risky behavior?” You probably can’t.  So perhaps your IT group should provide a secure offering that’s so compelling that your users will want to use your version rather than the commodity offerings that are so readily available.  This solution will have to be tailored to each company, but I think it starts with things like:

  • Pre-configured images with known baseline configurations and relevant tools so that groups can get up an running quickly without having to build and upload their own images.
  • Easy toolkits for migrating data and out of these images in a secure fashion, with some sort of DLP solution baked in.
  • Secure back-end storage to protect the data at rest in these images with no extra work on the part of the users.
  • Integration with the organization’s existing identity management and/or AAA framework so that users don’t have to re-implement their own solutions.
  • Integration with the organization’s auditing and logging infrastructures so you know what’s going on.

Putting together the kind of framework described above is a major IT project, and will require input and participation from your user community.  But once accomplished, it could provide massive leverage to overtaxed IT organizations.  Rather than IT having to engineer everything themselves, they provide secure self-service building blocks to their customers and let them have at it.

Providing architecture support and guidance in the early stages of each project is probably prudent.  After all, the one hardy little flower that blooms and refuses to die may become a critical resource to the organization that may eventually need to be moved back “in house”.  While the fact that the building blocks that were used to create the service are already well-integrated with the organization’s centralized IT infrastructure will help, having a reasonable architectural design from the start will also be a huge help when it comes time to migrate and continue scaling the service.

Am I advocating skunkworks IT?  No, I like to think I’m advocating self-service IT on a grand scale.  You’ll see what skunkworks IT looks like if you ignore this issue and just let your users develop their own solutions because you’re too busy to help them.

Hal Pomeranz, Deer Run Associates

awful-wiring-2-smallerOne of my personal and professional mantras is, “It doesn’t take that much longer to do it right.”  Sure, you can always do a half-assed job on some project just to shove it out the door, but there are inevitably down-stream costs.  Whether it’s time lost having to go back and fix your broken junk or customer frustration and negative perception, your short-cut decision usually ends up costing you more in the long run.

Of course you know I have a story to illustrate this principle.  During the peak of the dot-com boom I was helping a small start-up company move out of their sub-lease arrangement into their new permanent home.  The weekend of the move, we had contracted with a moving company to do the physical move of all the equipment and personal items and I was helping the IT team for the company do the setup, server room build out, telephony configuration, etc.  The building we were moving into was a two-story affair, and the plan for the equipment that was moving onto the second floor was to arrange it on pallets, wrap the pallet securely, and forklift the pallets through a second-story window.  Pretty standard practice, actually.

The moving company, however, had negotiated a fixed-price bid.  So it was in their best interests to get the work done as quickly as possible.  Without our knowledge, they decided to throw caution to the wind and not wrap the palleted equipment.  Sure enough, the first pallet went up on the forklift and as the pallet was moving forward through the window, one of the computers toppled off the edge and fell a full story onto concrete.  The punchline is that the computer belonged to the office manager for the company and was the computer that was going to cut the check to the moving company.  We were actually able to recover the hard drive from the twisted chassis and boot it in another PC, but the moving company had to pay a significant penalty that more than erased any savings that they might have achieved from not wrapping the pallets.  Also, after that incident, we made them stop and wrap all the pallets anyway, which cost them even more time.

Once the equipment was loaded in, the IT team and I could get started with our part of the project.  And it was a big project:  from our Friday evening start to late Sunday night, I think we got maybe 12 hours of downtime to sleep.  But there we were Sunday night and everything appeared to be ready for business to resume Monday morning.

Exhausted, but feeling pretty good about ourselves, we were standing admiring the new server room and somebody pointed out that we forgot to put the covers back on the cable raceways.  There was a collective groan.  We were “done”– surely replacing the covers could wait until the following week when we were all less tired?  But when we all looked each other in the eye, we knew that the upcoming week was going to be so chaotic that if we put off doing it now we’d never get around to it in a timely fashion.  So, without a word being spoken, we all grabbed some covers and spent another half hour or so fitting them into place.

When the execs toured the facility the following morning, we actually did get some compliments about how “neat” and “professional” the server room looked, but I don’t think that’s really why a crew of exhausted geeks spent an extra half hour re-assembling cable raceways.  Nor was it because we’re anal retentive or budding masochists.  I think it was because (a) we wanted to establish a culture of “doing things right” in the new server room so that entropy wouldn’t set in so quickly, and (b) we knew that going back and fixing the problem later would take longer than doing it right then.

So when you find yourself looking for excuses not to “finish” a project and just “get ‘er done”, take a moment and think about whether expediency is really the best policy.  Remember that bugs are always cheaper to fix in development than after the product ships.  It doesn’t take that much longer to do it right.

Avoiding Avoidance

April 6, 2009

Hal Pomeranz, Deer Run Associates

At a recent tech event, I ended up having a conversation with Wendy Kincade and Colleen Dick about how we sometimes let critical projects slide. Wendy said a very wise thing, which is that when people are avoiding a project, it’s most often because they don’t think they’ll reach a successful outcome.  In some sense it becomes a vicious cycle: we know we really should be making progress on that big project, but yet we somehow always find other things to be working on that allow us to avoid it.  Of course, the longer we avoid it, the less time we have to complete the project and it only becomes bigger and scarier as a result of the shrinking time window.

Full speed into the unknown

I’ve certainly come to recognize this behavior in myself.  It’s much more comfortable to spend your days doing small, tactical tasks that have short completion times and satisfying outcomes.  It’s a huge leap of faith to set out to tackle and enormous project when you’re not sure if you have all the skills necessary to accomplish the task, where the end of the project is not clearly in sight, and where the cost of failure may be high.  It gives me an uncomfortable feeling in the pit of my stomach, not unlike the feeling you get when contemplating stepping off from a great height.

When I recognize this feeling in myself, I immediately take steps to start tackling the project, because I know from previous experience that if I let it linger the situation is only going to get worse.  Here are some tactics that I’ve developed for getting over the hump:

1. Break it up: Vast monolithic projects are daunting, so break the project up into a set of deliverables, milestones, and dependencies.  Then outline the steps necessary to reach each component of the project.  You don’t have to create a formal project plan– in fact, I’ve seen people spend all their time grooming a plan in MS Project, just to avoid getting started on the actual work.  A simple outline format is fine.

2. Pick an easy one: Once you’ve got a notion of the individual tasks you need to accomplish to finish your project, pick one of the tasks that you think you can complete quickly and get it done.  During our conversation, Colleen commented, “I know that if I can just knock one thing down, that gives me energy to push further into the project.”

3. Make it fun: What motivates you?  I really enjoy figuring out and mastering new technology.  So if there’s a component of the project that requires me to do a bunch of research to figure something out, I’ll tend to do that first plus give myself leeway to spend extra time really getting mastery of that subject.  While I need to be careful to prevent turning the research into an avoidance exercise in and of itself, I also know that any mastery I acquire will be useful at some point in the future, even if it’s not directly relevant to the project at hand.  Some people reward themselves after completing a particular part of the project– take a break to play your favorite video game, hang out with friends, go for a hike, whatever.

4. Consider past success: Reflect on the fact that you’ve accomplished difficult tasks in the past.  Remember the satisfaction you felt when you finally shipped those projects.  Use these feelings to reinforce your belief that you’ll be successful at the project you’re currently embarking on.

While I’ve come to know my own avoidance behaviors and learned to take steps to work around them, I don’t think I’ll ever be entirely free of them.  I think it’s just a natural human risk aversion response.  However, I also recognize that one has to take risks to accomplish great things.  I have a quote from the explorer Magellan on the wall in my office and I look at it often:

Unlike the mediocre, the intrepid spirits seek victory over those things that seem impossible… They embark on the most daring of all endeavors… to meet the shadowy future without fear and conquer the unknown.

Hal Pomeranz, Deer Run Associates

In a comment to Monday’s post on Change Management, John Moore wrote:

I would argue, however, that this level of change management is only appropriate once you reach a certain size company. If the company is more than 100 people, you need to have these policies in place and you must have enforcement, or the cost for running the IT team in a manner that benefits the company is impossible.

In startups, where I have spent much of the last decade, the change management systems you have defined above would be overly prohibitive and remove the flexibility that is critical for success.

John’s not alone in expressing this view.  I’ve heard similar sorts of comments from companies of all different sizes– some of which were substantially larger than John’s suggested 100 person threshold.  But I think that change management is important regardless of what size you’re at, and it doesn’t have to remove any “flexibility” or “agility” from the organization.  Quite the contrary, appropriate change management should enable the organization to move more rapidly because it reduces failed changes an unplanned work that suck resources that could otherwise be more productively channeled.

The key word there is “appropriate”.  Of course the change management process in a 3-10 person start-up looks completely different from the process in a company with hundreds of employees.  In an early-stage start-up you’ve typically got a team of people working very closely together with laser focus on a single line of business.  You don’t tend to have the kind of “process flow control” issues that larger companies do, where you need change review meetings to balance competing priorities and competing schedule issues.

But even three-person start ups need to make production changes thoughtfully and with rigor.  It’s easy to think “we know what we’re doing” and get yourself into a lot of trouble and cause a significant outage.  It doesn’t take that much longer to sit down and write a detailed implementation plan, have one of your co-workers review it, and then execute it (Hickstein’s, “Think, think, think, type, type, type, `beer’!”).  And the bonus is that history of implementation plans helps you when you need to grow your infrastructure, because now you have the documented list of configuration changes necessary to produce replicas of your existing systems.

Do I think a three-person start-up needs formal change control meetings?  Heck no!  If you have regular Engineering meetings, set aside a little time to mention scheduled production updates (if any) and solicit feedback.  If you don’t have regular meetings, set up an email alias where notices of production changes can be posted.  That way, at least everybody will be aware of the current state of affairs on the production systems (or can refer back to the archives as appropriate), which is critical information for them to know as they’re developing code for those platforms.

I would, however, recommend that you implement some sort of configuration control process on your production systems.  It could be as simple as implementing an Open Source utility like AIDE or Samhain, just to keep an eye on what’s happening on the system.  Aside from alerting you to cockpit error on the part of your own people, these kinds of tools can also alert you to more nefarious activity and are part of a good baseline security posture.

At some point in the growth cycle of the company, you’re going to start getting feedback from developers that they “don’t care” about the production update notices.  Congratulations! You’ve just reached a major milestone in your company’s maturation process– the beginning of separation of duties.  This is probably also around the time you’ll be hiring your first full-time IT person, so start soliciting resumes.

Your change management processes will also start adjusting to your new realities.  Your new IT person is going to become the keeper of the implementation plans and other change documentation.  They’ll probably also start pushing you for more formal outage windows, just so they can have some predictability in the environment.  And they’re also going to start pushing back on the developers to keep them from making direct changes on the production systems.  Let these things happen.

The next thing you know, you’re going to look up and realize you’ve got several IT folks and they’ve got their own manager.  Furthermore, you’ve got several products now being developed concurrently.  This is the stage where John suggests that your company needs to start embracing a formal change management process like the one described in Visible Ops, and I agree.  Hopefully you figure out you’ve reached this stage before you have a production outage caused by multiple, badly coordinated updates.

Just like the wrong time to fix bugs in your product is after the product has shipped, it’s wrong to try and build a culture of change management from scratch in an established company.  It is very hard to change a “cowboy culture” once it’s been allow to establish itself.  Visible Ops has a quote from Dr. Bob Doppelt, who was actually speaking of public health matters when he uttered it, but it is nonetheless appropriate: “The righter we do the wrong things, the wronger we become.” The problem is that inattention to change management can appear to work for a period of time– mostly because nobody’s bothering to track the amount of time lost to firefighting and unplanned work.  But suddenly an organization wakes up and realizes that they’ve become utterly crushed by the tyranny of unplanned work.  Digging out of this hole is painful.

So resist the notion that change management is “only for big companies”.  Don’t you hope to be a big company some day? Well you’re not going to receive an angelic visitation complete with fully-functioning change management process on the magic day you somehow cross the “big company” threshold.  Better instead to be a small company that believes strongly in change management and grows naturally into a formal change management process.

Hal Pomeranz, Deer Run Associates

Lately Gene Kim, Kevin Behr, and I have been on a nearly messianic crusade against IT suckage.  Much of our discussion has centered around The Visible Ops Handbook that Gene and Kevin co-authored with George Spafford. Visible Ops is an extremely useful playbook containing four steps that IT groups can follow to help them become much higher performing organizations.

However, I will admit that Visible Ops is sometimes a hard sell.  That’s because the first step of Visible Ops is to create a working change management process within the IT organization– with functional controls and real consequences for people who subvert the change management process.  Aside from being a difficult task in the first place, just the mere concept of change management causes many IT folks to start looking for an exit.  “We hate change management!” they say.  “Don’t do this to us!”  What I quickly try to explain to them is that they don’t hate change management, they just hate bad change management.  And, unfortunately, bad change management is all they’ve experienced to date, so they don’t know there’s a better way.

What are some of the hallmarks of bad change management processes?  See if any of these sound familiar to you:

1. Just a box-checking exercise: The problem here is usually that an organization has implemented change management only because their auditors told them they needed it.  As a result, the process is completely disconnected from the actual operational work of IT in the organization.  It’s simply an exercise in filling out and rubber-stamping whatever ridiculous forms are required to meet the letter of the auditors’ requirements.  It does not add value or additional confidence to the process of making updates in the environment.  Quite the contrary, it’s just extra work for an already over-loaded operations staff.

2. No enforcement: The IT environment has no controls in place to detect changes, much less unauthorized changes.  If the process is already perceived as just a box-checking exercise and IT workers know that no alarms will be raised if they make a change without doing the paperwork, do you think they’ll actually follow the change management process?  Visible Ops has a great story about an organization that implemented a change management process without controls.  In the second month changes were down by 50%, and another 20% in month three, yet the organization was still in chaos and fighting with constant unplanned outages.  When they finally implemented automated change controls, they discovered that the rate of changes was constant, it’s just  the rate of paperwork that was declining.

3. No accountability: What does the organization do when they detect an unauthorized change?  The typical scenario is when a very important member of the operations or development staff makes an unauthorized change that ends up causing a significant outage.  Often this is where IT management fails their “gut check”– they fear angering this critical resource and so the perpetrator ends up getting at worst a slap on the wrist.  Is it any wonder then that the rest of the organization realizes that management is not taking the change management process seriously and thus the entire process can be safely ignored without individual consequences?

I firmly believe that change management can actually help an organization get things done faster, rather than slower.  Seems counter-intuitive, right?  Let me give you some recommendations for improving your change management process and talk about why they make things better:

1. Ask the right questions: What systems, processes, and business units will be affected? During what window will the work be done? Has this change been coordinated with the affected business units and how has it been communicated? What is the detailed implementation plan for performing the change? How will the change be tested for success? What is the back-out plan in case of failure?

Asking the right questions will help the organization achieve higher rates of successful changes, which means less unplanned work.  And unplanned work is the great weight that’s crushing most low-performing IT organizations.  As my friend Jim Hickstein so eloquently put it, “Don’t do: think, type, think, type, think, type, `shit’! Do: think, think, think, type, type, type, `beer’!”  Also, coordinating work properly with other business units means less business impact and greater overall availability.

2. Learn lessons: The first part of your change management meetings should be reviewing completed changes from the previous cycle.  Pay particular attention to changes that failed or didn’t go smoothly. What happened? How can we make sure it won’t happen next time?  What worked really well?  Like most processes, change management should be subject to continuous improvement.  The only real mistake is making the same mistake twice.

Again the goal of these post-mortems should be to drive down the amount of unplanned work that results from changes in the IT environment.  But hopefully you’ll also learn to make changes better and faster, as well as stream-lining the change management process itself.

3. Keep appropriate documentation: Retain all documentation around change requests, approvals, and implementation details. The most obvious reason to do this is to satisfy your auditors.  If you do a good job organizing this information as part of your change management process, then supplying your auditors with the information they need really should be as easy as hitting a few buttons and generating a report out of your change management database.

However, where all this documentation really adds value on a day-to-day basis is when you can tie the change management documentation into your problem resolution system.  After all, when you’re dealing with an unplanned outage on a system, what’s the first question you should be asking?  “What changed?”  Well, what if your trouble tickets automatically populated themselves with the most recent set of changes associated with the system(s) that are experiencing problems?  Seems like that would reduce your problem resolution times and increase availability, right?  Well guess what?  It really does.

4. Implement automated controls and demand accountability: If you want people to follow the change management process, they have to know that unplanned changes will be detected and consequences will ensue.  As I mentioned above, management is sometimes reluctant to following through on the “consequences” part of the equation.  They feel like they’re held hostage to the brilliant IT heroes who are saving the day on a regular basis yet largely ignoring the change management process.  What management needs to realize is that it’s these same heroes who are getting them into trouble in the first place.  The heroes don’t need to be shown the door, just moved into a role– development perhaps– where they maybe don’t have access to the production systems.

Again, the result is less unplanned work and higher availability.  However, it’s also my experience that having automated change controls also teaches you a huge amount about the way your systems and the processes that run on them are functioning.  This greater visibility and understanding of your systems leads to a higher rate of successful changes.

The great thing about the steps in Visible Ops is that each step gives back more resources to the organization than it consumes.  The first step of implementing proper and useful change management processes is no exception.  You probably won’t get it completely right initially, but if you’re committed to continuous improvement and accountability, I think you’ll be amazed at the results.

When benchmarking the high-performing IT organizations identified in Visible Ops, the findings were that these organizations performed 14 times more changes with one quarter the change failure rate of low-performing organizations, and furthermore had one third the amount of unplanned work and 10x faster resolution times when problems did occur.  For the InfoSec folks in the audience, these organizations were five times less likely to experience a breach and five times more likely to detect one when it occurred.  Further these organizations spent one-third the time on audit prep compared to low-performing organizations and had one quarter the number of repeat audit findings.

If change management is the first step on the road to achieving this kind of success, why wouldn’t you sign up for it?