You Don’t Hate Change Management (You Hate Bad Change Management)

Lately Gene Kim, Kevin Behr, and I have been on a nearly messianic crusade against IT suckage.  Much of our discussion has centered around The Visible Ops Handbook that Gene and Kevin co-authored with George Spafford. Visible Ops is an extremely useful playbook containing four steps that IT groups can follow to help them become much higher performing organizations.

However, I will admit that Visible Ops is sometimes a hard sell.  That’s because the first step of Visible Ops is to create a working change management process within the IT organization– with functional controls and real consequences for people who subvert the change management process.  Aside from being a difficult task in the first place, just the mere concept of change management causes many IT folks to start looking for an exit.  “We hate change management!” they say.  “Don’t do this to us!”  What I quickly try to explain to them is that they don’t hate change management, they just hate bad change management.  And, unfortunately, bad change management is all they’ve experienced to date, so they don’t know there’s a better way.

What are some of the hallmarks of bad change management processes?  See if any of these sound familiar to you:

1. Just a box-checking exercise: The problem here is usually that an organization has implemented change management only because their auditors told them they needed it.  As a result, the process is completely disconnected from the actual operational work of IT in the organization.  It’s simply an exercise in filling out and rubber-stamping whatever ridiculous forms are required to meet the letter of the auditors’ requirements.  It does not add value or additional confidence to the process of making updates in the environment.  Quite the contrary, it’s just extra work for an already over-loaded operations staff.

2. No enforcement: The IT environment has no controls in place to detect changes, much less unauthorized changes.  If the process is already perceived as just a box-checking exercise and IT workers know that no alarms will be raised if they make a change without doing the paperwork, do you think they’ll actually follow the change management process?  Visible Ops has a great story about an organization that implemented a change management process without controls.  In the second month changes were down by 50%, and another 20% in month three, yet the organization was still in chaos and fighting with constant unplanned outages.  When they finally implemented automated change controls, they discovered that the rate of changes was constant, it’s just  the rate of paperwork that was declining.

3. No accountability: What does the organization do when they detect an unauthorized change?  The typical scenario is when a very important member of the operations or development staff makes an unauthorized change that ends up causing a significant outage.  Often this is where IT management fails their “gut check”– they fear angering this critical resource and so the perpetrator ends up getting at worst a slap on the wrist.  Is it any wonder then that the rest of the organization realizes that management is not taking the change management process seriously and thus the entire process can be safely ignored without individual consequences?

I firmly believe that change management can actually help an organization get things done faster, rather than slower.  Seems counter-intuitive, right?  Let me give you some recommendations for improving your change management process and talk about why they make things better:

1. Ask the right questions: What systems, processes, and business units will be affected? During what window will the work be done? Has this change been coordinated with the affected business units and how has it been communicated? What is the detailed implementation plan for performing the change? How will the change be tested for success? What is the back-out plan in case of failure?

Asking the right questions will help the organization achieve higher rates of successful changes, which means less unplanned work.  And unplanned work is the great weight that’s crushing most low-performing IT organizations.  As my friend Jim Hickstein so eloquently put it, “Don’t do: think, type, think, type, think, type, `shit’! Do: think, think, think, type, type, type, `beer’!”  Also, coordinating work properly with other business units means less business impact and greater overall availability.

2. Learn lessons: The first part of your change management meetings should be reviewing completed changes from the previous cycle.  Pay particular attention to changes that failed or didn’t go smoothly. What happened? How can we make sure it won’t happen next time?  What worked really well?  Like most processes, change management should be subject to continuous improvement.  The only real mistake is making the same mistake twice.

Again the goal of these post-mortems should be to drive down the amount of unplanned work that results from changes in the IT environment.  But hopefully you’ll also learn to make changes better and faster, as well as stream-lining the change management process itself.

3. Keep appropriate documentation: Retain all documentation around change requests, approvals, and implementation details. The most obvious reason to do this is to satisfy your auditors.  If you do a good job organizing this information as part of your change management process, then supplying your auditors with the information they need really should be as easy as hitting a few buttons and generating a report out of your change management database.

However, where all this documentation really adds value on a day-to-day basis is when you can tie the change management documentation into your problem resolution system.  After all, when you’re dealing with an unplanned outage on a system, what’s the first question you should be asking?  “What changed?”  Well, what if your trouble tickets automatically populated themselves with the most recent set of changes associated with the system(s) that are experiencing problems?  Seems like that would reduce your problem resolution times and increase availability, right?  Well guess what?  It really does.

4. Implement automated controls and demand accountability: If you want people to follow the change management process, they have to know that unplanned changes will be detected and consequences will ensue.  As I mentioned above, management is sometimes reluctant to following through on the “consequences” part of the equation.  They feel like they’re held hostage to the brilliant IT heroes who are saving the day on a regular basis yet largely ignoring the change management process.  What management needs to realize is that it’s these same heroes who are getting them into trouble in the first place.  The heroes don’t need to be shown the door, just moved into a role– development perhaps– where they maybe don’t have access to the production systems.

Again, the result is less unplanned work and higher availability.  However, it’s also my experience that having automated change controls also teaches you a huge amount about the way your systems and the processes that run on them are functioning.  This greater visibility and understanding of your systems leads to a higher rate of successful changes.

The great thing about the steps in Visible Ops is that each step gives back more resources to the organization than it consumes.  The first step of implementing proper and useful change management processes is no exception.  You probably won’t get it completely right initially, but if you’re committed to continuous improvement and accountability, I think you’ll be amazed at the results.

When benchmarking the high-performing IT organizations identified in Visible Ops, the findings were that these organizations performed 14 times more changes with one quarter the change failure rate of low-performing organizations, and furthermore had one third the amount of unplanned work and 10x faster resolution times when problems did occur.  For the InfoSec folks in the audience, these organizations were five times less likely to experience a breach and five times more likely to detect one when it occurred.  Further these organizations spent one-third the time on audit prep compared to low-performing organizations and had one quarter the number of repeat audit findings.

If change management is the first step on the road to achieving this kind of success, why wouldn’t you sign up for it?

Advertisements

Follow the Money

I’m eternally amazed at how much cheaper computers, disks, networking gear, and pretty much everything IT-related has become since I started working in this industry.  In general, it’s a great thing.  But my friend Bill Schell recently pointed out one of the darker aspects of this trend during a recent email exchange.  Back in the mid-90’s Bill was running the Asia-Pacific network links for a large multi-national.  The “hub” of the network was a large Cisco router that cost upwards of a quarter of a million dollars.  As Bill pointed out, the company thought nothing of paying Bill a loaded salary of roughly half the purchase price of that router in order to keep it and the corporate WAN running smoothly.

Fifteen years later, you can get the same functionality in a device that costs an order of magnitude or two less.  And guess what?  Companies are expecting the costs associated with supporting these devices and the services they provide to be dropping at roughly the same rate as the cost of the equipment.  This translates to loss of IT jobs, or at least their migration to other IT initiatives.  It doesn’t matter that the functionality of the newer, cheaper devices is the same or perhaps even more complicated than the more expensive equipment they’re replacing.  Nor does it matter that the organization is expecting the same service levels or indeed even increased support for new applications and protocols.  “Do more with less” is the mantra.

This trend has all sorts of implications: hidden inefficiencies because reduced support levels impact critical business processes, significant security holes allowed to remain open due to insufficient levels of staffing and expertise, etc.  But what I want to talk about today is the implications for the career path of my fellow IT workers who are reading this blog.   And let me cut right to the bottom-line.  If you want your IT career to be long and profitable, make sure you’re supporting technology that costs a lot of money.  When you see the price of the equipment you’re managing dropping precipitously, start retraining on something new.

Let me give you an example from the early part of my career.  My first job out of college was doing IT support in an environment where they were dumping their Vax systems that cost hundreds of thousands of dollars for Unix workstations that cost tens of thousands of dollars.  Bye-bye Vax administrators, welcome the new, smaller coterie of workstation admins.  And it’s worth noting also that the Vax admins had replaced a small army of mainframe support folks from the previous generation.

And now 20 years later, commodity hardware and virtualization are forcing my generation of system administrators to move up the food chain in search of employment.  Some folks were lucky enough to keep their jobs in pursuit of server consolidation efforts, but notice that they’re now supporting orders of magnitude more systems in order to justify their salaries in the face of reduced equipment costs.  Storage technology was a nice pot of money to chase for a while there, and many of my people made the transition into SAN administration and similar jobs.  But again downward price pressure is being felt in this arena and the writing is on the wall– “do more with less.”

Some IT career choices seem to have historically provided safe havens.  The cost of database installations seems to have held steady or even increased as organizations have wanted to harness the power of larger and larger data sets and as the number of databases in organizations has exploded.  So DBA has always been a good career choice.  Information Security has also been a steady career choice because its budget is typically a constant fraction of total IT spending, rather than being tied to any particular technology.  Plus all of the recent regulatory requirements have ensured that Information Security’s percentage of the total IT budget has been going up, even as total IT budgets are shrinking.

So please keep these thoughts in the back of your mind as you’re plotting your next career moves in this difficult economy.  I’ve seen too many good friends pushed out the door in the name of “efficiency”.

Barbara Lee (In Honor of Ada Lovelace Day)

March 24 is Ada Lovelace Day.  To honor one of the first female computer scientists, the blogosphere has committed to posting articles about women role models in the computer industry.  This is certainly a scheme that I can get behind, and it also gives me the opportunity to talk about one of my earliest mentors.

When I graduated from college in the late 1980’s, my first job was doing Unix support at AT&T Bell Labs Holmdel.  I learned a huge amount at that job, and a lot of it was due to my manager, Barbara Lee.  “Tough broad” are the only words I can think of to describe Barbara, and I think she’d actually take those words as a compliment.  Completely self-taught, Barbara had worked her way up from the bottom and had finally smacked into a glass ceiling after becoming manager of the Unix administrators for the Holmdel Computing Center.  Barbara was also extremely active in the internal Bell Labs Computer Security Forum, and had earned her stripes tracking down and catching an attacker who had been running rampant on the Bell Labs networks many years earlier.

My vivid mental picture of Barbara is her banging away on her AT&T vt100 clone, composing some crazy complex ed or sed expression to pull off some amazing Unix kung fu, while occasionally taking drags on her cigarette (yes kids, you could still smoke in offices in those days).  Unfortunately, it was those cigarettes that ultimately led to Barbara’s death.

As tough and combatative as Barbara was when dealing with most people, she also had a strong caring streak that she mostly kept hidden.  Part Cherokee, Barbara arranged for much of our surplus equipment to make it to reservation schools whenever possible.  As I recall, we even shipped an entire DEC Vax to a reservation while I was there.  I always wondered what they did with that machine, but I’m sure it got put to good use.

And though she didn’t suffer fools gladly, Barbara occasionally took ignorant young savages like me under her wing.  Seeing that I had an interest in computer security, Barbara actually took me along to some of the Bell Labs Computer Security Forum meetings and to the USENIX Security Conference.  Less than I year out of college and I was getting to hang with folks like Bill Cheswick and Steve Bellovin.  How cool was that?  Without this early prodding from Barbara, I doubt my career would have turned out the way it did.

My favorite Barbara Lee story, however, involves an altercation I got into with the manager of another group.  At Bell Labs, the Electricians’ Union handled all wiring jobs, including network wiring.  I was doing a network upgrade one weekend and had arranged for the Electricians to run the cabling for me in advance of the actual cutover.  Unfortunately, Friday afternoon rolled around and the wiring work hadn’t even been started.

So I called the manager for that group and asked what the status was.  He told me that he was understaffed due to a couple of his people being unexpectedly out of the office and wouldn’t be able to get the work done.  The conversation went down hill from there, and ended up with me getting a verbal reaming and the promise of the Union taking the matter up with Barbara first thing Monday morning.

Needless to say, I was sweating bullets all weekend.  And I can remember the sinking feeling in the pit of my stomach when Barbara walked into my office Monday morning.  “Hal,” she said to me, “you just can’t talk to other managers like you talk to me.”  Then she turned around and walked out and never said another word to me about the incident again.

I’d have walked through fire for that woman.

Never Sell Security As Security

Some months ago, a fellow Information Security professional posted to one of the mailing lists I monitor, looking for security arguments to refute the latest skunkworks project from her sales department.  Essentially, one of the sales folks had developed a thick client application that connected to an internal customer database.  The plan was to equip all of the sales agents in the field with this application and allow them to connect directly back through the corporate firewall to the production copy of the database over an unencrypted link.  This seemed like a terrible idea, and the poster was looking to marshal arguments against deploying this software.

The predictable discussion ensued, with everybody on the list enumerating the many reasons why this was a bad idea from an InfoSec perspective and in some cases suggesting work-arounds to spackle over deficiencies in the design of the system.  My advice was simpler– refute the design on Engineering principles rather than InfoSec grounds.  Specifically:

  • The system had no provision for allowing the users to work off-line or when the corporate database was unavailable.
  • While the system worked fine in the corporate LAN environment, bandwidth and latency issues over the Internet would probably render the application unusable.

Sure enough, when confronted with these reasonable engineering arguments, the project was scrapped as unworkable.  The Information Security group didn’t need to waste any of their precious political capital shooting down this obviously bad idea.

This episode ties into a motto I’ve developed during my career: “Never sell security as security.”  In general, Information Security only gets a limited number of trump cards they can play to control the architecture and deployment of all the IT-related projects in the pipeline.  So anything they can do to create IT harmony and information security without exhausting their hand is a benefit.

It’s also useful to consider my motto when trying to get funding for Information Security related projects.  It’s been my experience that many companies will only invest in Information Security a limited number of times: “We spent $35K on a new firewall to keep the nasty hackers at bay and that’s all you get.”  To achieve the comprehensive security architecture you need to keep your organization safe, you need to get creative about aligning security procurement with other business initiatives.

For example, file integrity assessment tools like Tripwire have an obvious forensic benefit when a security incident occurs, but the up-front cost of acquiring, deploying, and using these tools just for the occasional forensic benefit often makes them a non-starter for organizations.  However, if you change the game and point out that the primary ongoing benefit of these tools is as a control on your own change management processes, then they become something that the organization is willing to pay for.  You’ll notice that the nice folks at Tripwire realized this long ago and sell their software as “Configuration Control”, not “Security”.

Sometimes you can get organizational support from even further afield.  I once sold an organization on using sudo with the blessings of Human Resources because it streamlined their employee termination processes: nobody knew the root passwords, so the passwords didn’t need to get changed every time somebody from IT left the company.  When we ran the numbers, this turned out to be a significant cost-savings for the company.

So be creative and don’t go into every project with your Information Security blinders on.  There are lots of projects in the pipeline that may be bad ideas from an Information Security perspective, but it’s likely that they have other problems as well.  You can use those problems as leverage to implement architectures that are more efficient and rational from an Engineering as well as from an Information Security perspective.  Similarly there are critical business processes that the Information Security group can leverage to implement necessary security controls without necessarily spending Information Security’s capital (or political) budget.

Pondering IT Project Management Issues

Lately I was reading another excellent blog post from Peter Thomas where he was discussing different metaphors for IT projects.  As Peter points out, it’s traditional to schedule IT projects is if they were standard real-world construction projects like building a skyscraper.  Peter writes in his blog:

Building tends to follow a waterfall project plan (as do many IT projects). Of course there may be some iterations, even many of them, but the idea is that the project is made up of discrete base-level tasks whose duration can be estimated with a degree of accuracy. Examples of such a task might include writing a functional specification, developing a specific code module, or performing integration testing between two sub-systems. Adding up all the base-level tasks and the rates of the people involved gets you a cost estimate. Working out the dependencies between the base-level tasks gets you an overall duration estimate.

Peter goes on to have some wise thoughts about why this model may not be appropriate for specific types of IT projects, but his description above got me thinking hard about an IT project management issue that I’ve had to grapple with during my career.  The problem is that the kind of planned project work that Peter is discussing above is only one type of work that your IT staff is engaged in.  Outside of the deliverables they’re responsible for in the project schedule, your IT workers also have routine recurring maintenance tasks that they must perform (monitoring logs, shuffling backup media, etc) as well as losing time to unplanned work and outages.  To stretch our construction analogy to its limits, it’s as if you were trying to build a skyscraper with a construction crew that moonlighted as janitors in the neighboring building and were also on-call 24×7 as the local volunteer fire department.  You were expecting the cement for the foundation to get poured on Thursday, but the crew was somewhere else putting out a fire and when they got done with that they had to polish the floors next door, so now your skyscraper project plan is slipping all over the place.

I’ve developed some strategies for dealing with these kinds of issues, but I don’t feel like I’ve discovered the “silver bullet” for creating predictability in my IT project schedules.  Certainly one important factor is driving down the amount of unplanned work in your IT environment.  Constant fire fighting is a recipe for failure in any IT organization, but how to fix this problem is a topic for another day. Another important strategy is to rotate your “on-call” position through the IT group so that only a fraction of your team is engaged in fire fighting activities at any given week.  When a person is on-call, I normally mark their resources as “unavailable” on my project schedule just as if they were out of the office, and then resource leveling allows you to more accurately predict the date for deliverables that they’re responsible for.

Finally, I recognize that IT workers almost never have 100% of their time available to work on IT projects, and I set their project staffing levels accordingly.  I may only be able to schedule 70% of Charlene’s time to project Whiz-Bang, because Charlene is our Backup Diva and loses 30% of her time on average to routine backup maintenance issues and/or being called in to resolve unplanned issues with the backup system.  And notice the qualifier “on average” there– some weeks Charlene may get caught up in dealing with a critical outage with the backup system and not be able to make any progress on her Project Whiz-Bang deliverables.  When weeks like this happen, you hope that Charlene’s deliverables aren’t on the critical path and that she can make up the time later in the project schedule– or you bring in other resources to pick up the slack.

Which brings me to another important piece of strategy I’ve picked up through the years: IT project slippage is inevitable, so you want to catch it as quickly as possible.  The worst thing that can happen is that you get to the milestone date for a multi-week deliverable only to discover that work on this segment of the project hasn’t even commenced.  This means you need to break your IT projects down into small deliverables that can be tracked individually and continuously.  I’m uncomfortable unless the lowest-level detail in my project schedule has durations of a few days or less.  Otherwise your project manager is almost guaranteed to be receiving some nasty surprises much too late to fix the problem.

These are some of the strategies I’ve come up with for managing IT projects, but I still admit to some large amount of trepidation when wrangling large IT efforts.  I’m curious to hear if any of you reading this blog have useful strategies that you’ve developed for managing IT projects in your environment?  Let’s discuss them in the comments!

Never Argue With Your Boss

Early in my career, I had the opportunity to listen to a talk by Bill Howell on “managing your manager”.  I don’t recall much about the talk, but one item that stuck with me was his advice, “Never argue with your boss, because even if you ‘win’, you lose.”

At the time, I was young and cocksure and tended towards confrontation in my interactions with co-workers.  If I disagreed with somebody, we each threw down our best technical arguments, wrangled over the problem, and may the biggest geek win.  Being “right” was the most important thing.  So Bill’s advice seemed outright wrong to me at the time.  Of course one should argue with their boss!  If they were “wrong”, then let’s mix it up and get to the “correct” solution.

Flash forward a few years later and I was working as a Senior Sys Admin at a company in the San Francisco Bay Area.  We were trying to roll out a new architecture for supporting our developer workstations, and I was clashing with my boss over the direction we should go in.  Worse still, the rest of the technical team was in favor of the architecture that I was championing.  True to form, I insisted on going for the no-holds-barred public discussion.  This, of course, transformed the situation from a simple technical disagreement into my completely undercutting my boss’ authority and basically engineering a mutiny in his group.

Matters came to a head at our weekly IT all-hands meeting.  Because of the problems our group was having, both my boss and his boss were in attendance.  Discussion of our new architecture got pretty heated, but I had an answer for every single one of my boss’ objections to my plan.  In short, on a technical level at least, I utterly crushed him.  In fact, in the middle of the meeting he announced, “I don’t need this s—“, and walked out of the meeting.  I had “won”, and boy was I feeling good about it.

Then I looked around the table at the rest of my co-workers, all of whom were staring at me with looks of open-mouthed horror.  I don’t think they could have been more shocked if I had bludgeoned my boss to death with a baseball bat.  And frankly I couldn’t blame them.  If I was willing to engineer a scene like had just transpired in our all-hands meeting, how could they trust me as a member of their team?  I might turn on them next.  Suddenly I didn’t feel so great.

I went home that night and did a great deal of soul-searching.  Bill Howell’s words came back to me, and I realized that he’d been right.  Admittedly, my case was an extreme situation, but if I had followed Bill’s advice from the beginning, things need never have escalated to the pitch that they finally reached.  The next morning, I went in and apologized to my boss and agreed to toe the line in the future, though it certainly felt like a case of too little too late.  I also started looking for a new job, because I realized nobody there really wanted to work with me after that.  I was gone a month later, and my boss lasted several more years.

My situation in this case was preventable.  As I look back on it now, I realize that my boss and I could have probably worked out some face-saving compromise behind closed doors before having any sort of public discussions.  Of course, sometimes you find yourself in an impossible situation: whether because of incompetence, malice, or venality on the part of your management.  In these cases you can sit there and take it (hoping that things will get better), fight the good fight, or “vote with your feet” and seek alternate employment.  The problem is that fighting the good fight often ends with you seeking alternate employment anyway, so be sure to start putting out feelers for a new job before entering the ring.  Sitting there and taking it should be avoided if at all possible– I’ve seen too many of my friends’ self-esteem totally crippled by psycho managers.

Bottom line is that one of the most important aspects of any job is making your boss look good whenever possible.  This doesn’t mean you can’t disagree with your boss.  Just make sure that you don’t have those disagreements publicly and make it clear at all times that you’re not attempting to pre-empt your manager’s authority.  “Managing up” is a delicate skill that needs to be honed with experience, but as a first step at  least try to avoid direct, public disagreements with those above you in the management chain.

And thanks for the advice, Bill.  Even if I didn’t listen to you the first time.

‘Remember You Said Dead’

Fifteen years ago or more I was listening to a presentation by Vint Cerf where he was advocating for the adoption of CIDR as a solution to many of the routing issues the core Internet providers were facing at the time.  In responding to his critics, he made an off-handed comment to the effect that, “People say to me, ‘Vint, you can have my IP prefix when you pry it out of my cold, dead fingers.’ To which I respond, ‘Remember you said dead.'”

Needless to say, this got a huge laugh out of the audience.  But the kernel of this little comment is a nugget of IT wisdom that applies in so many different situations.  To state it more plainly, there are times of compelling change when the most sensible course is to simply ignore the current installed base issues and just move forward.  By the time you’ve finished your new roll-out, today’s installed base will be completely subsumed into the new technology.

In Vint’s case, his position was spectacularly vindicated of course, because the number of Internet-connected hosts grew by an order of magnitude during the period when CIDR was being rolled-out.  But this kind of thinking also applies equally well on a smaller scale to operational issues faced by many IT operations.  Have new baseline images you want to roll out to your organization but are meeting resistance from your user community?  Roll the images out on newly deployed systems only and wait for attrition to take care of the existing installed base.  Given the cycle rate of technology in most organizations, that gives you a half-life of change in about 18 months.