If I just keep saying it, it will be true

February 28, 2009

I’ve been working on an interesting project at work.  It isn’t the normal kind of understand this process help us improve it work, however.  I might call it “using statistics for evil” except that the intent isn’t really evil.  It’s just that people don’t like what the data is telling them.

The back story is there is a new project X (hmm, that connotes a really secret project, which I don’t think this is, but let’s go with the name anyway) that is being worked on.  The project was going along fine until it got to testing.  For some reason the test team could not get through the test cases.

The claim they made was that the test environments were unstable and that’s why they couldn’t get work done.  To prove their point, they collected detailed start and end times of each outage.  The numbers sure looked impressive.  Hundreds of hours of outages were reported.

One might stop there and say that the recording of outages is proof in itself that work was impeded.  But remember the claim was made only to justify that they couldn’t get their testing done.

Anyway, something seemed fishy about the story to a senior manager who asked me to do some digging.  Taking from my new found admiration for Karl Popper, I decided the best way to approach this problem was to exercise this hypothesis looking for negative evidence.

The first thing I did was extract the number of test cases executed per day.  Then, I lined them up with the number of hours of outage reported per day.  Assuming test cases would be executed at a fairly constant rate, which they are, days with outages should naturally yield less test cases being run.  Seems fair, right?

It was a simple correlation test.  And the result was a Pearson correlation of 0.151.  Essentially, there’s no evidence of a relationship between outages and less work being done.  I will fully admit that all I can find is a lack of evidence to support the hypothesis.  Statistics is unfriendly that way, in if you find solid evidence you know it’s there, but if you don’t find evidence you don’t know whether the relationship doesn’t exist or just that you couldn’t observe it.

Anyway, I decided the right thing to do was to keep looking for possible proof/disproof of the hypothesis.  I looked at whether outages resulted in more test case failures or blockages.  They didn’t.  I looked at whether outages resulted in more defects during the period being cancelled.  They didn’t.  My thinking on that one was maybe people were confusing code defects and outages and we’d see a rise in cancelled defect tickets during outage periods.

Then I said, well let’s look at it from a test case duration perspective.  Maybe I can see that test cases take longer when they overlap with outages than if they don’t.  I separated the cases into two populations, those with outages and those without outages occurring during their run.

The data had both unequal variances and significant skewness which unfortunately leaves me without a good hypothesis test to use.  Mann-Whitney assumes equal variances in the data.  A 2 sample T can handle unequal variances but assumes a normal distribution.  It can handle some non-normality but not as much as I was seeing.

Still, just for fun I tried a 2 sample T assuming unequal variances.  It resulted in a p-value of 0.000, a huge amount of certainty that the population of test cases with outages took on average longer than test cases without averages.  Visually, looking at the histograms it didn’t look like much to me, but still, it got me a little worked up.

See, I had a report out before the latest experiment and I had shared with the team my lack of evidence for their hypothesis.  That really raised the ire of the managers who had set out their position as “it’s not my fault we couldn’t get the job done, it’s the environment.”  And I was telling them, and their development partners at the same time, that wasn’t true (or at least I could find no evidence of it). 

So, to have data (the result of the 2 sample T test) which supported their view after the fact would mean admitting it.  I’m big enough to admit I’m wrong, but something just didn’t add up.  How could it be that the rate at which test cases were marked complete stayed constant but the duration of test cases with outages was significantly longer?  Then it hit me… I had been momentarily fooled by the same false causation that they had! 

Starting out on a sunny morning, imagine that I take a 5 minute drive down to the local grocery store.  By the time I get there, it is still sunny and it hasn’t rained.  By comparison, let’s imagine that I take a drive across the country.  Rather than the trip taking 5 minutes, like it does to the grocery store, it takes a few days.  Now, even if the weather was perfect the entire trip, it’d still take a few days to drive across the country.  It’s just big.  It takes a long time.  But, my chances of encountering a rainstorm while driving down to the grocery store having started out on a sunny morning are much much less than if I start out on a multi-day trip across the country.  Because of the longer duration, weather regardless, of driving across the country, there are more opportunities for it to rain on me.

And that’s what the team was really seeing.  Tests that are naturally long running, and therefore hard to complete in the small window of time they had to actually do the testing, were more likely to overlap in timing with a reported an environment outage.  The evironment outage didn’t do anything to really impede progress, but you just weren’t going to see an outage on a short test.  The test would be over and done with too soon.  So, when I separated test cases into those where there were and were not outages, all the short test cases ended up in the “no outages” bucket and all the naturally longer test cases ended up in the “overlapped with a reported outage” bucket.  It looked as if cases were longer because of outages, but in reality, they would have been longer running regardless.  The relationship is incidental.

Anyway, satisfied that I had explained away outages, I was prepared to stop this fool’s errand; they insisted that I keep looking.  One manager said “I was there with them, I experienced it first hand.”  And all I can say in response is that the data doesn’t support the claims you are making.  But she kept on saying it.  I’m pretty sure she was determined to go down with the ship so to speak.

I realize it’s hard to revise your view of the world in the face of conflicting evidence.  Yet there are studies that show that those who are willing to change their world view in light of new data are more successful.  Indeed clinging to old unsupported views is bordering on the definition of insanity.  Say it all you want, it doesn’t make it true.


The Black Swan’s Fatal Flaw

February 19, 2009

So I’ve been reading The Black Swan by Nassim Nicholas Taleb recently.  It’s a decent book, though not significantly different from his other work Fooled by Randomness.  Having read the latter I realized he introduced a flaw to his own thinking in the former.

In Fooled By Randomness he talks about how people make up narrative fallacies for things they cannot predict.  In the aftermath we make up great, and mostly false, stories about how it could have been avoided.  “If only I had…”  The example he uses is a what-if someone had insisted that all airplanes have sturdy locked cabin doors prior to September 11th.  In that case, in that alternative world, September 11th wouldn’t have happened because someone would have planned for it.  The event would have been avoided and we’d be none the wiser to the possibility of someone crashing planes into the twin towers, the Pentagon, or the field in Pennsylvania.  The point being that we hardly ever give credit to the person who was thoughtful and avoided the disaster in the first place by being prepared for the outlier situation others hadn’t seen coming.  Instead, we reward the heroics.

At any rate, on p.130 of The Black Swan, Mr. Taleb goes into a discussion about Casinos and how they have these sophisticated surveillance systems to catch cheaters.  And yet, he points out, that the biggest losses to the Casino came not from risks that they anticipated but those that they didn’t anticipate.  His point being, I think, for all the models that they have about cheaters and high rollers and so on that could strip the casino of profits, they had nothing to protect against, for example, Siegfried and Roy’s white tiger attacking them.

But he forgets about his what-if scenario around September 11th.  Clearly the casino was quite thoughtful about many of the risks from gamblers, etc.  What if, for example, they had not done any of that, would the casino have simply gone broke from cheaters BEFORE any of these drastic events could have taken place?  The problem with his own argument is that it is based on a what-if as well.  Specifically a hypothesis that all this fancy surveillance they put it doesn’t make a difference compared to external factors.  We cannot explore the alternative history where the casino didn’t have all the safeguards in place so we really don’t know this.

He argues that any one gambler’s cheating is a drop in the bucket compared to these massive unexpected events, but in a world where they weren’t controlled for, would the casino even still be around?  It could be death by a thousand cuts instead.

I’m starting to wonder if his whole case isn’t essentially there are things we don’t know that we can’t control for and when they happen, you might be wiped from the face of the earth.  And to that I say, “um, so what?”  I can’t go around living my life thinking about them.  Casinos, for example, do what is rational to control for the risks they can so they don’t get wiped out by what Mr. Taleb would call a Mediocristan thing.  However, when a tiger mauls your stage act, well you just have to learn to adapt.  Being aware that we are unaware doesn’t change my behavior; I’m still by definition unaware that the tiger has had just about enough of his captors and is about to take his revenge.  For everything I can imagine there is something, maybe several orders of magnitude more, that I cannot.

I think it was best summed up by my MBB coach who said to me about the models I was building “many models are useful, no model is perfect.”


Is that scalable?

February 17, 2009

Cost, quality and speed.  They’re the three things that everyone talks about as the main things to measure for an organization.  Today someone brought up one that I initially dismissed, but did have some thoughts about – scalability.  As processes go, we tend not to worry too much about scale.

What does scale mean anyway?  Well, if you ask our business partners they’ll tell you that scale is the ability to have the next transaction cost less than the prior one.  It is essentially a logarithmic curve where at some point no matter how much more you ask for the cost doesn’t go up very much.  This is a good model for a business that wants to have lots and lots of throughput from an ever increasing customer base.  The only way to get more money is to bring in more revenue which means taking on more clients at presumably an ever lower cost.

But for a software development process, I’m not sure that vision of scale is right.  Excluding funny sine-wave shaped curves, which one of these three curves represents a scalable process to you?

scalability-curves1

Choice 1: the logarithmic curve.  As the size of the request (whether it’s function points, or whatever) increases the cost very quickly rises until it caps out.  This would be OK if all you ever handled were really large requests.  You’d be efficient at doing big things, once they reached a certain “bigness”

Choice 2: the exponential curve.  As the size of the request increases the cost rises very slowly until BLAM! the cost spirals into the stratosphere.  This would be OK if all you ever handled were little things.  You’d be lean and mean, so to speak, but if someone asked for something big, you’d be incapable of doing it for a reasonable cost at all.

Choice 3: the linear relationship.  The cost of things increases in a controlled manner as the size of requests increase.  I personally think this is the right “curve” for process work and the true picture of what a scalable process means.

See, business talks about scale with this grand vision that if only we had some sort of logarithmic relationship in our process that we’d be able to get rich off each subsequent client once we had accomplished “scale.”  Having been recently reading The Black Swan, I think this would be what Mr. Taleb would call a scalable endeavour.  There’s a hope that you can get a disproportionately large reward for no more effort.  For example, the way that someone can give a concert to one person or a 50,000 person stadium and they exert the same effort either way.  The inverse would be, as Mr. Taleb likes to use, the dentist, who through the diligence of drilling a lot of teeth makes a good living, but can’t simultaneously drill a million people’s teeth.  It doesn’t “scale” which in his case means, it scales very linearly (drill one tooth get a dollar, drill 10 teeth get 10 dollars, drill 100 teeth get 100 dollars) but it doesn’t scale in some grand way.

I don’t think anyone views the exponential curve as a good thing.  However, in situations where you never did anything very large, you’d by far outpace your “logarithmic” competition.  You’d just have to recognize that you can’t take on big projects and you could be quite happy steadily costing far less than anyone else for a certain size of work.

At any rate, logarithmic scale is well and good for the business, but processes, including writing code, don’t scale like that.  One developer, working diligently turns out linearly more code for each additional amount of effort.  His work doesn’t “scale”.  An extra hour doesn’t suddenly get him 10 or 100 times the code.  So, as the size of the work requested rises, so does the cost.  Some research by Gary Gack indicates that, in fact, coding may be exponential, that we actually become less efficient at doing bigger projects.

So, when it comes to process, which “curve” do you really want?  If work of varying size comes in, which it does with software requests, then your process should be able to handle them all.  Neither the logarithmic relationship nor the exponential relationship accomplish that.  Scale for process, in my mind, means a linear relationship between the size of the request and the cost to yield that work.  Yes, that means if you ask for more work it’ll cost more, but in some linear way.  There is more “opportunity” so there is more cost.  The other two models either don’t scale up or don’t scale down.  In the linear model, process may just be able to change the slope of the line.


Worst survey ever

February 12, 2009

I don’t care for surveying. There are times when you need to do it, but as I learned early on, surveys are not good ways to get measurable data. Surveys are good ways to get opinions.

Around my office, we tend to use surveys to ask a bunch of people questions about measurable things. It isn’t that we want their opinion; it’s just that we’re too lazy to go and ask each one of them personally for the data we need.

We survey for all kinds of stuff that we could just measure: how many hours did you work this week? (I can tell that from time cards) How long is your commute? (I can look that up on a map)

I’m used to getting crummy surveys and for the most part not responding. The other day I got what I believe may count as the WORST SURVEY EVER! It broke almost every good rule of surveying.

Let’s review:

  1. Don’t ask multipart questions in a single question. You won’t know to which part they responded to. For example, from this survey “do you have a membership in a professional organization and are you planning on renewing it?” If I answer yes, then one can assume that both is true. If I answer no, however, is the NO that I don’t have a membership or is the NO that I don’t plan on renewing it?
  2. Don’t branch. Rather than say “do you have X? if yes, skip to question 5, otherwise continue.” Instead, use multiple choice. In the survey I got, there were numerous questions that branched like so “are you going to any conferences this year? yes / no.” The very next question is “which conferences are you going to this year?” Um, well, let’s see if I answered NO to the first question then this question is irrelevant. If I answered YES to the first question, then this question simply clarifies it. It’d just be easier to have the second question “which conferences are you going to this year?” and have the options be “conference A, conference B, other (please specify), or NONE.” Save yourself a question by using the NONE option.
  3. Don’t lead. “Are you intending to renew your membership in the very important project management group?” Um, well, since you put it that way, clearly you think I should. I guess the answer, at least for your survey, is yes. In reality, I probably won’t renew, but I just won’t tell you that.

All in all, I think the survey I got was 20 questions. As far as I could tell the survey had, if well written, probably 3 pieces of information collected. All the rest of the questions just supported branching and duplicate lines of questioning. It felt a bit like the Spanish Inquisition. I was trying to figure out if they were tricking me into answering one of the lines of questioning incorrectly to trap me.

Oh, and one more thing about surveys. Surveys aren’t generally viewed as mandatory so if you want 100% response rate, there’s probably a better way to get your data. Stop using an otherwise useful tool for the wrong purpose.


Cyclomatic complexi-what?

February 10, 2009

It’s been a while since I’ve had the opportunity for a good rant.  Today provided it.

Not that long ago I wrote about the new measurement system QA was proposing - defects per test case – as a proxy for code quality.  Anyway, as part of that effort I had to then go out to all the middle managers and explain our approach and how we arrived at that conclusion.

I’m delighted to be asked to give a talk about statistical analysis whenever I can.  And it’s always a fun challenge when someone else enters the room who believes they’ve got similar or better ability.  I genuinely like having my stats knowledge tested.  But there’s another type who comes to meetings who I don’t like.  People who don’t bother to challenge the analysis but just dismiss the idea as wrong.  Take for example today.

“I have some concerns about using defects to measure quality.”

You what!?!?  That’s like saying, at least in my mind, I have some issues using rulers to measure length.  The best minds in the IT field recognize that defects, typically in the form of defect containment is one measure you cannot live without.  He goes on…

“Quality doesn’t necessarily mean defects.  It could mean cyclomatic complexity or what value the business derives from the functionality.  If we don’t build it well and it is unusable then it isn’t high quality.”

I can feel my face turning red as he pontificates on the subject.  It’s good I maintain my cool.  For one, what the heck is “cyclomatic complexity.”  I can ascertain from the context that it is some measure of the code complexity.  What to even say.  I think two words sum it up.  WHO CARES!  I know, I know, how dare I say that.  Everyone knows that having maintainable code is good for the customer down the road when they request changes.

Let’s be honest.  Option one is the Agile approach where you build only what you need and no more.  If you end up having to add a feature that breaks the design you “refactor.”  If you are doing waterfall, you try and anticipate some future need to reduce the risk of having to redesign, but eventually, your model will be broken and you’ll have to refactor or band-aid it forever.  You’re not going to get away without rewriting parts of the system.  Ever.  The complexity of your system will rise and fall over time.

Anyway, more importantly, and here’s the secret.  In the immediate need, YOUR CUSTOMER DOESN’T CARE ABOUT CYCLOMATIC COMPLEXITY!!!  You can write the crummiest spaghetti code that has ever been authored and if it meets your customer’s need, and they don’t modify it, they’ll never know.  They’ll never care.  Having high cyclomatic complexity might (and I stress MIGHT) be a leading indicator of future cost, but I can’t think of a single customer I ever encountered who said “I won’t be happy unless this product has low cyclomatic complexity.”  Don’t confuse your idea of what quality means with what your customer says.

On the other front, business value is a ridiculous way to look at the quality of what you deliver.  Now don’t get me wrong, you must be unswervingly focused on what your customer wants.  That said, as an organization we have people whose job it is up front to analyze the market need, determine the potential value, gather cost information to assure an adequate CBA and finally specify the product that will solve that need. 

I know you are going to be morally offended, but once it hits systems, you pretty much should be thinking of yourself as an order taker.  Yes, an order taker who can make some suggestions, but essentially an order taker.  You have no control over the value side of the equation.  You didn’t specify the product needed.  The business did.  If the business had some ill-conceived idea of how the solution ought to work that the customers won’t find valuable, it isn’t the responsibility of systems to second guess it.  The entire process of building something isn’t a forum for intellectual discourse.  Eventually someone has to do something and build it!  Creating waiting or rework by debating the request is not lean and if nothing else it’s presumptuous that systems understands the needs of the business better than the business.

Could you imagine on the floor of the Toyota plant some guy down there saying “you know, if we built go-carts instead of cars, I think people would be happier.”  And then proceeded to build a go-cart instead of a Camry?  I sure as heck wouldn’t be excited when they tried to deliver THAT product to me!  There’s always room for innovation on the factory floor, but it’s about ways to better and faster put together the car, not making the car more “usable.”

So sure, ultimately a quality product is something that people want to use.  That’s the “do the right thing” part of the equation.  That’s the strategy of selecting the right product to build.  In terms of the factory floor – analysis, design, code and test – it’s all about tactics “doing the thing right.”

Which brings us back around to our customer, who specifically said (yes, we were smart enough TO GO ASK OUR CUSTOMER!) that defects was their key measurement of the quality of the product.  Funny that cyclomatic complexity didn’t come up…


Prove me wrong

February 7, 2009

I’m not much of a reader honestly.  I know that seems strange, but I like to take information in small quantities, like articles, short chapters, etc.  Works that need me to read from beginning to distant end to get the whole story don’t keep my attention.  So it’s a bit odd that I would be reading “Fooled by Randomness” recently, but it was given to me by a coworker and I felt some obligation to read it.  So far, so good, actually.  It’s a little less data than I like, in fact holding what might amount to a handful of truths wrapped by long, long prose.

Regardless, as I read through I came upon “the black swan” which finally allowed me to tie together a blog entry that I’ve been hanging onto for a long, long time.

So, here’s the idea of the black swan.  “I have never seen a black swan, therefore no black swans exist.”  This is a very difficult statement.  On one hand, it might be true, for through sampling of the population of swans randomly, you’d probably not see one.  They do in fact exist, in Australia, but even with a really large sample, if you never took a swan from down under, you might conclude that no black swans exist.

The author’s point is that the converse statement is much easier.  If you can find even one black swan you can confidently make the statement “not all swans are white.”  Say perhaps you sampled as few a 2 swans, and one was white and one was black.  Tada, your point has been proved.  On the other hand, the absoluteness of the first statement, no matter how large your sample size, until you can observe the entire population of all swans that have ever or will ever exist, cannot be proved.

Anyway, hearkening way back to the Fixed Price Shop entry we encounter a similar story.  I had concluded with a high degree of confidence that all estimates fell within a very small range of results and therefore we were conclusively a fixed price shop.

Karl Popper coined the term falsifability.  It is the ability to prove something false.  Scientific research is falsifable; religion is not.  That’s not to say that religion is correct, but because it is untestable is cannot be proved nor disproved.  I’m probably paraphrasing too much, but essentially the point of falsifability is that we take a hypothesis and attempt to beat it to death to prove ourselves wrong.  By doing so, once we have excluded the alternative hypothesis which make our “discovery” a non-event, we can finally call what we have a discovery.

So here we were with me claiming to be a fixed price shop.  All my observational data said so!  That’s when we needed to exclude the other possibilities.  See, being a fixed price development shop meant that regardless of the work requested that we’d produce the same estimate anyway.

So, what would that mean?  Well, it’d have to mean that there was no correlation between the requirements and the price paid for the work.  The first issue is that statistics cannot tell you that there is a non-event it can simply tell you that there isn’t an event.  It’s subtle, I know, but the point is that just because you can’t find statistical evidence of a relationship doesn’t mean the relationship doesn’t exist.  But let’s assume for a minute that because I couldn’t find a relationship that it meant something.  As a non-event option, however, it could mean that all projects were simply the same size.  If people always requested about the same amount of work, then it’d make sense that the projects cost about the same.

I admit, it is hard to figure out how big the requirements are, but I just did something simple.  I took a sample of projects and counted the number of rows of requirements in the documents.  It was very convenient that most of our requirements documents are written in a nice list format.

Indeed, I could find no correlation between the cost of my sample projects and the number of requirements.  It sure seemed that the number of requirements varied but the cost did not.  The alternative hypothesis that all requests were the same size was defeated, right?  Wrong!

What if I found no relationship between the two not because there was no relationship but because my method of counting was no good.  Maybe all work really was the same size and the way I sized the work was bad?  How could we prove that?

Well, we knew that the method for estimating was essentially: read the requirements, estimate the development effort, tack on all the other surrounding resources.  In fact, this is exactly why we believed that the estimates were fixed price.  There was a variable price to the project (it is the development work) but it was dwarfed by all the garbage that people put around it.  The end result could vary very little because the signal was essentially drowned out by the noise.

Still, there was a potential non-event here with the prior research.  We might have a black swan.  We knew that the sample projects requirements bore no relation to the costs of the sample projects, but we’d never found a case where it did.

Well, I couldn’t wait to observe all possible projects that ever were or ever will be, so we had to figure this out somehow.  Recalling that the estimation methodology, in theory, estimated the development portion of the work off the requirements, if we could show that requirements to development effort were correlated but that requirements to overall effort weren’t correlated, that would show that we were correctly observing the process.

So, we tested the same counts of requirements against just the development portion of the effort.  And wouldn’t you know it, the counts of requirements were highly correlated to cost of development!

It’s a long way round to get to a simple point.  Once you find that you have proof of something, it is your responsibility to posit the non-events that could undo your discovery.  Then, test the non-events to see if you’ve just found something mundane or something truly interesting.  And then, if necessary, test the non-events of those non-events.  In my story, though I left out some paths for brevity, it goes as follows:

Statement: All projects cost approximately the same because we don’t estimate the work, we just put in arbitrary efforts. 

Possibility: But, what if really what is happening is that all work requested is just the same size? 

Statement:  But we can show that the number of requirements varies from project to project and has no correlation to the cost.

Possibility:  But what if really what is happening is that your counting of requirements is no good?

Statement:  But we can show that the counts of requirements are highly correlated to the development effort, but not to the overall effort.

It’s my job to prove me wrong as part of a thorough analysis.  It’s my job to explore the possibility that I have found nothing, to show that my claims are potentially falsifable.  And then to show that I can refute those claims of non-events.  It’s my job to cut down my own research until the alternative explanations have been pruned away.

It’s your job as well.  Try to prove yourself wrong, or you can be sure that someone else will.


Good to be unskilled?

February 4, 2009

Have you ever wanted someone to say this about your company “wow, we are really horrible at doing X”?  And I don’t mean in a thank-god-they-are-admitting-they-have-issues kind of way, but more in a “I’m really proud we can’t do that well” kind of way.  Up until recently, I thought the answer to my question was “are you crazy!?!?  Of course not!  I want us to do everything really well.”

Like many large companies, the one I work for is undergoing a tough time due to the economy.  And of course, that means layoffs.  I’ve been fortunate to have been unscathed as of yet, and particularly thankful since my well of new ideas would dry up quickly if I wasn’t actively involved in process work.

Anyway, unfortunately, people that I liked at work were not so lucky as I was.  You get around to talking with these folks about how the experience of being laid off is.  I mean, it can’t be fun, but you want to know if it went relatively well.

Of course, it doesn’t go well.  I realize there’s no good way to lay someone off, but there are less bad ways.  I’ve heard horrid rumors of other companies laying people off via email and simply just locking the doors to the building.  We were nowhere that far down on the scale.

But there are always things you can do better.  For example, the process of laying people off starts first thing in the morning and continues until everyone has been told.  But, since you have no warning as to whether you are going to be laid off or not, those of us who kept our jobs sit around in our offices panicking that we’re next.  At some point during the day it is over, and wouldn’t you want to know that?  We heard nothing until hours after the last layoff had been done.

Unnecessary hours, in my opinion, that I should not have had to spend worrying needlessly.  I know, I know, think of how the people who were laid off felt.  Was it really that bad of a thing to leave me wondering?  No, not really, but it could have been done better.

Later on that evening, I considered how lucky I was that our company is terrible at communicating during layoffs.  Why are they so terrible?  Well, it’s a rare occurrence.  If you don’t practice it, even if you learn from a prior experience you never get to apply those learnings.  If my company was expertly prepared to do layoffs, I’d be a little worried.

Sure, isn’t it great that they are super-capable?  No!  It’s awful!  It’s something they shouldn’t be doing, something that they have rarely had to do.  They should be god-awful at it.  Frankly, it hurt at the time, but now I’m downright pleased.

And not just communicating layoffs applies here.  Disaster recovery of all forms might be fair game.  I mean, if you’ve gotten your system, process, product, whatever it is, so reliable that you’re unprepared for when it  fails, that might actually be a good thing.

If you fail all the time, you’ll have the people and processes in place to deal with the failure.  You’ll be expert fire fighters, and that’s just not the place you want to be.

I’m not offering a free pass to companies for not being capable of dealing with a disaster that is likely to occur - like your servers or network going down – but at some point if you’ve really gotten good at something, I’d expect you’d be bad at dealing with the outlier.

Could it be good to be unskilled at something you shouldn’t be doing in the first place?  I think so.