Hal Pomeranz, Deer Run Associates

Early in my career, I had the opportunity to listen to a talk by Bill Howell on “managing your manager”.  I don’t recall much about the talk, but one item that stuck with me was his advice, “Never argue with your boss, because even if you ‘win’, you lose.”

At the time, I was young and cocksure and tended towards confrontation in my interactions with co-workers.  If I disagreed with somebody, we each threw down our best technical arguments, wrangled over the problem, and may the biggest geek win.  Being “right” was the most important thing.  So Bill’s advice seemed outright wrong to me at the time.  Of course one should argue with their boss!  If they were “wrong”, then let’s mix it up and get to the “correct” solution.

Flash forward a few years later and I was working as a Senior Sys Admin at a company in the San Francisco Bay Area.  We were trying to roll out a new architecture for supporting our developer workstations, and I was clashing with my boss over the direction we should go in.  Worse still, the rest of the technical team was in favor of the architecture that I was championing.  True to form, I insisted on going for the no-holds-barred public discussion.  This, of course, transformed the situation from a simple technical disagreement into my completely undercutting my boss’ authority and basically engineering a mutiny in his group.

Matters came to a head at our weekly IT all-hands meeting.  Because of the problems our group was having, both my boss and his boss were in attendance.  Discussion of our new architecture got pretty heated, but I had an answer for every single one of my boss’ objections to my plan.  In short, on a technical level at least, I utterly crushed him.  In fact, in the middle of the meeting he announced, “I don’t need this s—“, and walked out of the meeting.  I had “won”, and boy was I feeling good about it.

Then I looked around the table at the rest of my co-workers, all of whom were staring at me with looks of open-mouthed horror.  I don’t think they could have been more shocked if I had bludgeoned my boss to death with a baseball bat.  And frankly I couldn’t blame them.  If I was willing to engineer a scene like had just transpired in our all-hands meeting, how could they trust me as a member of their team?  I might turn on them next.  Suddenly I didn’t feel so great.

I went home that night and did a great deal of soul-searching.  Bill Howell’s words came back to me, and I realized that he’d been right.  Admittedly, my case was an extreme situation, but if I had followed Bill’s advice from the beginning, things need never have escalated to the pitch that they finally reached.  The next morning, I went in and apologized to my boss and agreed to toe the line in the future, though it certainly felt like a case of too little too late.  I also started looking for a new job, because I realized nobody there really wanted to work with me after that.  I was gone a month later, and my boss lasted several more years.

My situation in this case was preventable.  As I look back on it now, I realize that my boss and I could have probably worked out some face-saving compromise behind closed doors before having any sort of public discussions.  Of course, sometimes you find yourself in an impossible situation: whether because of incompetence, malice, or venality on the part of your management.  In these cases you can sit there and take it (hoping that things will get better), fight the good fight, or “vote with your feet” and seek alternate employment.  The problem is that fighting the good fight often ends with you seeking alternate employment anyway, so be sure to start putting out feelers for a new job before entering the ring.  Sitting there and taking it should be avoided if at all possible– I’ve seen too many of my friends’ self-esteem totally crippled by psycho managers.

Bottom line is that one of the most important aspects of any job is making your boss look good whenever possible.  This doesn’t mean you can’t disagree with your boss.  Just make sure that you don’t have those disagreements publicly and make it clear at all times that you’re not attempting to pre-empt your manager’s authority.  “Managing up” is a delicate skill that needs to be honed with experience, but as a first step at  least try to avoid direct, public disagreements with those above you in the management chain.

And thanks for the advice, Bill.  Even if I didn’t listen to you the first time.

Advertisements

The Blame Game

March 5, 2009

Hal Pomeranz, Deer Run Associates

“A strange game. The only winning move is not to play.”

from the movie “War Games” (1983)

One classic pathology of low-performing IT organizations is that when an outage occurs they spend an inordinate amount of time trying to figure out who’s fault it is, rather than working the problem.  Dr. Jim Metzler has even coined a new metric for this activity: Mean Time To Innocence (MTTI), defined as the average time it takes each part of the IT Operations organization to demonstrate that it’s not responsible for the outage.  Whether you call it a “Witch Hunt” or “The Blame Game” or identify it with some other term, it’s a huge waste of the time and ends up making everybody involved look like a complete ignoramus.  It’s also one of the classic signs to the rest of the business that IT Operations is completely out of touch, because otherwise they’d be trying to solve the problem rather than working so hard at finding out whose fault it is.

I’m so intolerant of this kind of activity that I will often accept the blame for things I’m not responsible for just so we can move out of the “blame” phase into the “resolution” phase.  As the CIO in Metzler’s article so eloquently put it, “I don’t care where the fault is, I just want them to fix it.”   At the end of the day, nobody will remember whose fault it is, because once the problem is addressed they’ll forget all about it in the rush of all the other things they have to do.  At most they’ll remember the “go-to guy/gal” who made the problem go away.

To illustrate, let me tell you another story from my term as Director of IT for the Internet skunkworks of a big Direct Mail Marketing company.  We were rolling out a new membership program, and as an incentive we were offering the choice of one of three worthless items of Chinese-made junk with each new membership.  I’m talking about the kind of stuff you see as freebie throw-ins on those late-night infomercials– book lights, pocket AM/FM radios, “inside the shell” egg scramblers, etc.  The way the new members got their stuff is that we passed a fulfillment code to the back-end database at corporate that triggered the warehouse mailing the right piece of junk to the new member’s address.

About a week and a half into the campaign our customer support center started getting lots of angry phone calls: “Hey! I requested the egg scrambler and got this crappy book light instead.”  This provoked a very urgent call from one of the supervisors at the call center.  I said it sounded like there was a problem somewhere in the chain from our web site into fulfillment and I’d get to the bottom of it, and in the meantime we agreed that the best policy was to tell the customer to keep the incorrectly shipped junk as our gift and we’d also send them the junk they requested.

Once we started our investigation, the problem was immediately obvious.  We had an email from the fulfillment folks with the code numbers for the various items, and those were the code numbers programmed into our application. However, when we checked the list of fulfillment codes against the back-end data dictionary, we realized that they’d transposed the numbers for the various items when they sent us the email.  Classic snafu and an honest mistake.  Once we figured out the problem, it took seconds to fix the codes and only a few minutes to run off a report listing all of the new members who were very shortly going to be receiving the wrong items.

So the question then became how to communicate the problem and the resolution to the rest of the business.  I settled for simplicity over blame:

“We have recently been made aware of a problem with the fulfillment process in the new rollout of member service XYZ, resulting in new members receiving the wrong promotional items.  This was due to fulfillment codes being incorrectly entered into the web application.  We have corrected the problem and have provided Customer Service and Fulfillment with the list of affected members so that they can ship the appropriate items immediately.”

You will note that I carefully didn’t specify whose “fault” it was that the incorrect codes were inserted into the application.  I’m sure the rest of the business assumed it was my team’s fault.  I’m sure of this because the Product Manager in charge of the campaign called me less than fifteen minutes after I sent out the email and literally screamed at me– I was holding the phone away from my ear– that he knew it wasn’t our fault (he’d seen all the email traffic during the investigation) and how could I let the rest of the company assume we were at fault?

And I told him what I’m going to tell you now: nobody else cared whose fault it was.  Fulfillment was grateful that I’d jumped on this particular hand grenade and saved them from the shrapnel.  My management was impressed that we’d resolved the problem less than two hours after the initial reports and had further produced a list of the affected members so that Customer Service could get out ahead of the problem rather than waiting for irate customers to call.  Total cost was minimal because we caught it early and addressed it promptly.

And that’s the bottom line: all that throwing blame around would have done was made people angry and lengthen our time to resolution.  Finding somebody to blame doesn’t make you feel justified or more fulfilled somehow, it just makes you tired and frustrated.  So always try to short-circuit the blame loop and move straight into making things better.

Hal Pomeranz, Deer Run Associates

At the end of our recent SANS webcast, Mike Poor closed by emphasizing how important it was for IT and Information Security groups to advertise their operational successes to the rest of the organization (and also to their own people).  Too often these functions are seen as pure cost centers, and in these difficult economic times it’s up to these organizations to demonstrate return value or face severe cutbacks.

The question is what are the right metrics to publish in order to indicate success?  All too often I see organizations publishing meaningless metrics, or even metrics that create negative cultures that damage corporate perception of the organization:

  • It seems like a lot of IT Ops groups like to publish their “look how much stuff we operate” metrics: so many thousand machines, so many petabytes of disk, terabytes of backup data per week, etc.  The biggest problem with these metrics is that they can be used to justify massive process inefficiencies.  Maybe you have thousands of machines because every IT project buys its own hardware and you’re actually wasting money and resources that could be saved by consolidating.  Besides, nobody else in the company cares how big your… er, server farm is.
  • Then there are the dreaded help desk ticket metrics: tickets closed per week, average time to close tickets, percentage of open tickets, etc.  The only thing these metrics do is incentivize your help desk to do a slapdash job and thereby annoy your customers.  There’s only one help desk metric that matters: customer satisfaction.  If you’re not doing customer satisfaction surveys on EVERY TICKET and/or you’re not getting good results then you fail.

So what are some good metrics?  Well I’m a Visible Ops kind of guy, so the metrics that matter to me are things like amount of unplanned downtime (drive to zero), number of successful changes requiring no unplanned work or firefighting (more is better), number of unplanned or unauthorized changes (drive to zero), and projects completed on time and on-budget (more is better).  Of course, if your IT organization is struggling, you might be tempted to NOT publish these metrics because they show that you’re not performing well.  In these cases, accentuate the positive by publishing your improvement numbers rather than the raw data: “This month we had 33% less unplanned downtime than last month.”  This makes your organization look proactive and creates the right cultural imperatives without airing your dirty laundry.

There are a couple of other places where I never fail to toot my own horn:

  • If my organization makes life substantially better for another part of the company then you’d better believe I’m going to advertise that fact.  For example, when my IT group put together a distributed build system that cut product compiles down from over eight hours to less than one hour, it not only went into our regular status roll-ups, but I also got the head of the Release Engineering group to give us some testimonials as well.
  • Whenever a significant new security vulnerability comes out that is not an issue for us because of our standard builds and/or operations environment, I make sure the people who provide my budget know about it.  It also helps if you can point to “horror story” articles about the amount of money other organizations have had to pay to clean up after incidents related to the vulnerability.  This is one of the few times that Information Security can demonstrate direct value to the organization, and you must never miss out on these chances.

What’s That Smell?

If communicating your successes builds a corporate perception of your organization’s value, being transparent about your failures builds trust with the rest of the business.  If you try to present a relentlessly positive marketing spin on your accomplishments your “customers” elsewhere in the company will become suspicious.  Plus you’ll never bamboozle them sufficiently with your wins that they won’t notice the elephant in the room when you fall on your face.

The important things to communicate when you fail are that you understand what lead to the failure, that you have the situational awareness to understand the impact of the failure on the business, and the steps you’re taking to make sure that the same failure never happens again (the only real organizational failure is allowing the same failure to happen twice).  Here’s a simple checklist of items you should have in your disclosure statement:

  • Analysis of the process(es) that led to the failure
  • The duration of the outage
  • How the outage was detected
  • The systems and services impacted
  • Which business units were impacted and in what way
  • Actions taken to end the outage
  • Corrective processes to make sure it never happens again

Note that in some cases it’s necessary to split the disclosure across 2-3 messages.  One is sent during the incident telling your constituents, “Yes, there’s a problem and we’re working it.”  The next is the “services restored at time X, more information forthcoming” message.  And then finally your complete post-mortem report.  Try to avoid partial or incomplete disclosure or idle speculation without all of the facts– you’ll almost always end up with egg on your face.

Conclusion

If you don’t communicate what’s happening in your IT and/or InfoSec organization then the the other business units are basically going to assume you’re not doing anything during the time when you’re not directly working on their requests. This leads to the perception of IT as nothing more than “revenue sucking pigs“.

However, you also have to communicate in the right way.  This means communicating worthwhile metrics and metrics which don’t create bad cultural imperatives for your organization.  And it also means being transparent and communicating your failures– in the most proactive way possible– to the rest of the organization.