Hal Pomeranz, Deer Run Associates

In a comment to Monday’s post on Change Management, John Moore wrote:

I would argue, however, that this level of change management is only appropriate once you reach a certain size company. If the company is more than 100 people, you need to have these policies in place and you must have enforcement, or the cost for running the IT team in a manner that benefits the company is impossible.

In startups, where I have spent much of the last decade, the change management systems you have defined above would be overly prohibitive and remove the flexibility that is critical for success.

John’s not alone in expressing this view.  I’ve heard similar sorts of comments from companies of all different sizes– some of which were substantially larger than John’s suggested 100 person threshold.  But I think that change management is important regardless of what size you’re at, and it doesn’t have to remove any “flexibility” or “agility” from the organization.  Quite the contrary, appropriate change management should enable the organization to move more rapidly because it reduces failed changes an unplanned work that suck resources that could otherwise be more productively channeled.

The key word there is “appropriate”.  Of course the change management process in a 3-10 person start-up looks completely different from the process in a company with hundreds of employees.  In an early-stage start-up you’ve typically got a team of people working very closely together with laser focus on a single line of business.  You don’t tend to have the kind of “process flow control” issues that larger companies do, where you need change review meetings to balance competing priorities and competing schedule issues.

But even three-person start ups need to make production changes thoughtfully and with rigor.  It’s easy to think “we know what we’re doing” and get yourself into a lot of trouble and cause a significant outage.  It doesn’t take that much longer to sit down and write a detailed implementation plan, have one of your co-workers review it, and then execute it (Hickstein’s, “Think, think, think, type, type, type, `beer’!”).  And the bonus is that history of implementation plans helps you when you need to grow your infrastructure, because now you have the documented list of configuration changes necessary to produce replicas of your existing systems.

Do I think a three-person start-up needs formal change control meetings?  Heck no!  If you have regular Engineering meetings, set aside a little time to mention scheduled production updates (if any) and solicit feedback.  If you don’t have regular meetings, set up an email alias where notices of production changes can be posted.  That way, at least everybody will be aware of the current state of affairs on the production systems (or can refer back to the archives as appropriate), which is critical information for them to know as they’re developing code for those platforms.

I would, however, recommend that you implement some sort of configuration control process on your production systems.  It could be as simple as implementing an Open Source utility like AIDE or Samhain, just to keep an eye on what’s happening on the system.  Aside from alerting you to cockpit error on the part of your own people, these kinds of tools can also alert you to more nefarious activity and are part of a good baseline security posture.

At some point in the growth cycle of the company, you’re going to start getting feedback from developers that they “don’t care” about the production update notices.  Congratulations! You’ve just reached a major milestone in your company’s maturation process– the beginning of separation of duties.  This is probably also around the time you’ll be hiring your first full-time IT person, so start soliciting resumes.

Your change management processes will also start adjusting to your new realities.  Your new IT person is going to become the keeper of the implementation plans and other change documentation.  They’ll probably also start pushing you for more formal outage windows, just so they can have some predictability in the environment.  And they’re also going to start pushing back on the developers to keep them from making direct changes on the production systems.  Let these things happen.

The next thing you know, you’re going to look up and realize you’ve got several IT folks and they’ve got their own manager.  Furthermore, you’ve got several products now being developed concurrently.  This is the stage where John suggests that your company needs to start embracing a formal change management process like the one described in Visible Ops, and I agree.  Hopefully you figure out you’ve reached this stage before you have a production outage caused by multiple, badly coordinated updates.

Just like the wrong time to fix bugs in your product is after the product has shipped, it’s wrong to try and build a culture of change management from scratch in an established company.  It is very hard to change a “cowboy culture” once it’s been allow to establish itself.  Visible Ops has a quote from Dr. Bob Doppelt, who was actually speaking of public health matters when he uttered it, but it is nonetheless appropriate: “The righter we do the wrong things, the wronger we become.” The problem is that inattention to change management can appear to work for a period of time– mostly because nobody’s bothering to track the amount of time lost to firefighting and unplanned work.  But suddenly an organization wakes up and realizes that they’ve become utterly crushed by the tyranny of unplanned work.  Digging out of this hole is painful.

So resist the notion that change management is “only for big companies”.  Don’t you hope to be a big company some day? Well you’re not going to receive an angelic visitation complete with fully-functioning change management process on the magic day you somehow cross the “big company” threshold.  Better instead to be a small company that believes strongly in change management and grows naturally into a formal change management process.

Hal Pomeranz, Deer Run Associates

Lately Gene Kim, Kevin Behr, and I have been on a nearly messianic crusade against IT suckage.  Much of our discussion has centered around The Visible Ops Handbook that Gene and Kevin co-authored with George Spafford. Visible Ops is an extremely useful playbook containing four steps that IT groups can follow to help them become much higher performing organizations.

However, I will admit that Visible Ops is sometimes a hard sell.  That’s because the first step of Visible Ops is to create a working change management process within the IT organization– with functional controls and real consequences for people who subvert the change management process.  Aside from being a difficult task in the first place, just the mere concept of change management causes many IT folks to start looking for an exit.  “We hate change management!” they say.  “Don’t do this to us!”  What I quickly try to explain to them is that they don’t hate change management, they just hate bad change management.  And, unfortunately, bad change management is all they’ve experienced to date, so they don’t know there’s a better way.

What are some of the hallmarks of bad change management processes?  See if any of these sound familiar to you:

1. Just a box-checking exercise: The problem here is usually that an organization has implemented change management only because their auditors told them they needed it.  As a result, the process is completely disconnected from the actual operational work of IT in the organization.  It’s simply an exercise in filling out and rubber-stamping whatever ridiculous forms are required to meet the letter of the auditors’ requirements.  It does not add value or additional confidence to the process of making updates in the environment.  Quite the contrary, it’s just extra work for an already over-loaded operations staff.

2. No enforcement: The IT environment has no controls in place to detect changes, much less unauthorized changes.  If the process is already perceived as just a box-checking exercise and IT workers know that no alarms will be raised if they make a change without doing the paperwork, do you think they’ll actually follow the change management process?  Visible Ops has a great story about an organization that implemented a change management process without controls.  In the second month changes were down by 50%, and another 20% in month three, yet the organization was still in chaos and fighting with constant unplanned outages.  When they finally implemented automated change controls, they discovered that the rate of changes was constant, it’s just  the rate of paperwork that was declining.

3. No accountability: What does the organization do when they detect an unauthorized change?  The typical scenario is when a very important member of the operations or development staff makes an unauthorized change that ends up causing a significant outage.  Often this is where IT management fails their “gut check”– they fear angering this critical resource and so the perpetrator ends up getting at worst a slap on the wrist.  Is it any wonder then that the rest of the organization realizes that management is not taking the change management process seriously and thus the entire process can be safely ignored without individual consequences?

I firmly believe that change management can actually help an organization get things done faster, rather than slower.  Seems counter-intuitive, right?  Let me give you some recommendations for improving your change management process and talk about why they make things better:

1. Ask the right questions: What systems, processes, and business units will be affected? During what window will the work be done? Has this change been coordinated with the affected business units and how has it been communicated? What is the detailed implementation plan for performing the change? How will the change be tested for success? What is the back-out plan in case of failure?

Asking the right questions will help the organization achieve higher rates of successful changes, which means less unplanned work.  And unplanned work is the great weight that’s crushing most low-performing IT organizations.  As my friend Jim Hickstein so eloquently put it, “Don’t do: think, type, think, type, think, type, `shit’! Do: think, think, think, type, type, type, `beer’!”  Also, coordinating work properly with other business units means less business impact and greater overall availability.

2. Learn lessons: The first part of your change management meetings should be reviewing completed changes from the previous cycle.  Pay particular attention to changes that failed or didn’t go smoothly. What happened? How can we make sure it won’t happen next time?  What worked really well?  Like most processes, change management should be subject to continuous improvement.  The only real mistake is making the same mistake twice.

Again the goal of these post-mortems should be to drive down the amount of unplanned work that results from changes in the IT environment.  But hopefully you’ll also learn to make changes better and faster, as well as stream-lining the change management process itself.

3. Keep appropriate documentation: Retain all documentation around change requests, approvals, and implementation details. The most obvious reason to do this is to satisfy your auditors.  If you do a good job organizing this information as part of your change management process, then supplying your auditors with the information they need really should be as easy as hitting a few buttons and generating a report out of your change management database.

However, where all this documentation really adds value on a day-to-day basis is when you can tie the change management documentation into your problem resolution system.  After all, when you’re dealing with an unplanned outage on a system, what’s the first question you should be asking?  “What changed?”  Well, what if your trouble tickets automatically populated themselves with the most recent set of changes associated with the system(s) that are experiencing problems?  Seems like that would reduce your problem resolution times and increase availability, right?  Well guess what?  It really does.

4. Implement automated controls and demand accountability: If you want people to follow the change management process, they have to know that unplanned changes will be detected and consequences will ensue.  As I mentioned above, management is sometimes reluctant to following through on the “consequences” part of the equation.  They feel like they’re held hostage to the brilliant IT heroes who are saving the day on a regular basis yet largely ignoring the change management process.  What management needs to realize is that it’s these same heroes who are getting them into trouble in the first place.  The heroes don’t need to be shown the door, just moved into a role– development perhaps– where they maybe don’t have access to the production systems.

Again, the result is less unplanned work and higher availability.  However, it’s also my experience that having automated change controls also teaches you a huge amount about the way your systems and the processes that run on them are functioning.  This greater visibility and understanding of your systems leads to a higher rate of successful changes.

The great thing about the steps in Visible Ops is that each step gives back more resources to the organization than it consumes.  The first step of implementing proper and useful change management processes is no exception.  You probably won’t get it completely right initially, but if you’re committed to continuous improvement and accountability, I think you’ll be amazed at the results.

When benchmarking the high-performing IT organizations identified in Visible Ops, the findings were that these organizations performed 14 times more changes with one quarter the change failure rate of low-performing organizations, and furthermore had one third the amount of unplanned work and 10x faster resolution times when problems did occur.  For the InfoSec folks in the audience, these organizations were five times less likely to experience a breach and five times more likely to detect one when it occurred.  Further these organizations spent one-third the time on audit prep compared to low-performing organizations and had one quarter the number of repeat audit findings.

If change management is the first step on the road to achieving this kind of success, why wouldn’t you sign up for it?

Hal Pomeranz, Deer Run Associates

Some months ago, a fellow Information Security professional posted to one of the mailing lists I monitor, looking for security arguments to refute the latest skunkworks project from her sales department.  Essentially, one of the sales folks had developed a thick client application that connected to an internal customer database.  The plan was to equip all of the sales agents in the field with this application and allow them to connect directly back through the corporate firewall to the production copy of the database over an unencrypted link.  This seemed like a terrible idea, and the poster was looking to marshal arguments against deploying this software.

The predictable discussion ensued, with everybody on the list enumerating the many reasons why this was a bad idea from an InfoSec perspective and in some cases suggesting work-arounds to spackle over deficiencies in the design of the system.  My advice was simpler– refute the design on Engineering principles rather than InfoSec grounds.  Specifically:

  • The system had no provision for allowing the users to work off-line or when the corporate database was unavailable.
  • While the system worked fine in the corporate LAN environment, bandwidth and latency issues over the Internet would probably render the application unusable.

Sure enough, when confronted with these reasonable engineering arguments, the project was scrapped as unworkable.  The Information Security group didn’t need to waste any of their precious political capital shooting down this obviously bad idea.

This episode ties into a motto I’ve developed during my career: “Never sell security as security.”  In general, Information Security only gets a limited number of trump cards they can play to control the architecture and deployment of all the IT-related projects in the pipeline.  So anything they can do to create IT harmony and information security without exhausting their hand is a benefit.

It’s also useful to consider my motto when trying to get funding for Information Security related projects.  It’s been my experience that many companies will only invest in Information Security a limited number of times: “We spent $35K on a new firewall to keep the nasty hackers at bay and that’s all you get.”  To achieve the comprehensive security architecture you need to keep your organization safe, you need to get creative about aligning security procurement with other business initiatives.

For example, file integrity assessment tools like Tripwire have an obvious forensic benefit when a security incident occurs, but the up-front cost of acquiring, deploying, and using these tools just for the occasional forensic benefit often makes them a non-starter for organizations.  However, if you change the game and point out that the primary ongoing benefit of these tools is as a control on your own change management processes, then they become something that the organization is willing to pay for.  You’ll notice that the nice folks at Tripwire realized this long ago and sell their software as “Configuration Control”, not “Security”.

Sometimes you can get organizational support from even further afield.  I once sold an organization on using sudo with the blessings of Human Resources because it streamlined their employee termination processes: nobody knew the root passwords, so the passwords didn’t need to get changed every time somebody from IT left the company.  When we ran the numbers, this turned out to be a significant cost-savings for the company.

So be creative and don’t go into every project with your Information Security blinders on.  There are lots of projects in the pipeline that may be bad ideas from an Information Security perspective, but it’s likely that they have other problems as well.  You can use those problems as leverage to implement architectures that are more efficient and rational from an Engineering as well as from an Information Security perspective.  Similarly there are critical business processes that the Information Security group can leverage to implement necessary security controls without necessarily spending Information Security’s capital (or political) budget.