Leaner today

June 17, 2009

My wife and I are attending a wedding in the next few weeks. That’s a generally unremarkable occurrence. We’re about the age where our friends are getting married and starting to think about kids. In fact, I generally dread the events since we’ve been to so many and each one costing us a gift, a hotel room for a night or two, a sitter for our daughter, yet another new dress for my wife, and the list goes on…

Going to weddings is expensive. At any rate, we had not actually been to a wedding in quite a while. I have recently lost some weight, so I decided to try on my suit and make sure it still fit. Alas, it did not, so adding to this wedding’s tab will be a tailoring of my suit.

At my lunch break I popped down to the local tailor (who is just the stereotype I imagined sitting in his little shop with his thick eastern European accent). He has me put on my suit and stand up on the little, what do you call it, soapbox, I guess, to have a look see. I mention in passing that I’ve lost some weight recently.

He’s kind of tugging here and there, getting a sense for it, mumbles something under his breath about it being a nice suit (which I appreciate). Eventually he looks up at me and says “you’ve lost 30 or 40 pounds, no?” I smile. Hey, just because I’m a guy doesn’t mean I can’t appreciate this compliment.

“No, ” I reply, “I don’t think that much.”

“Ah, it must have been a little too large to begin with.”

“Hmph,” I think. Some nerve. First he compliments my weight loss (which I admit I had not lost 30 or 40 pounds – or at least I don’t think so)  and then suddenly it’s a suit that was always too large…

Too large to begin with, eh?  That reminds me of something. Measurements! Recently at work, someone asked me how we compared to our competitors in regards to development efficiency. Setting aside the fact that nobody can agree on exactly how we should measure efficiency, I reply “what does it matter?”

“Well don’t you want to know how we’re doing?”

“Is our customer happy with our performance, “ I ask.

“No.”

“Then it doesn’t matter how our competitors are doing.  We are not doing well enough.”

It’s like my suit. Sure, I’m too small for my suit to begin with. Compared to my suit, I am leaner than the suit would hold, but am I happy with my weight? I guess I’m ok with it, but I could stand to lose a few more pounds.

By comparison, we may be better than our competitors when it comes to development efficiency. The suit sized for our competitors is too big for us, so to speak. Alas, it doesn’t matter, because the customer (or in the case of my suit, my wife) doesn’t really care that you are good in comparison. Sure, I guess maybe my wife is glad that I’m not the thousand pound or five hundred pound or even two hundred pound man coming down the street, but I still could be a bit leaner than I am. And so can your company. It’s not success just because you beat everyone else. If it isn’t good enough for your customer, it isn’t good enough.


Bring me problems

March 2, 2009

I have endured the ultimate insult!

One of the things we have tried to do in our organization is to remove one of the forms of waste, specifically intellectual waste.  LEAN, as far as I knew it, recognized seven forms of waste.  That was, until I saw a recent presentation about it that included an eighth form – intellectual waste.  Intellectual waste is having highly skilled workers, or specialized workers doing unintelligent work.  For example, your best sales guy being forced to data enter sales into the database rather than having an intern do that.  In the same vein, having skilled black belts hand creating charts and graphs for MBFs is waste as well.

Anyway, the company has generously, and I do truly appreciate it, invested quite a bit of time and money in me to train me as a black belt and get me certified.  But I come from a developer background, then a project manager and only most recently a process person.  I haven’t forgotten my roots; I still develop snippets of code quite often.

In fact, my ability to write code, SQL queries, and the like differentiates me from my peers in a significant manner.  While my peers are forced to utilize convenience sampling, I can select every transaction out of the database and either do proper random sampling OR take it one step further and just look at our entire history.  It also means that I can solve a form of waste for the team.  I can quickly automate queries and processing.  Alas, it is a blessing and a curse.

In turn, I have tried to share that capability with my peers by agreeing to write queries for them when necessary.  In my mind, it makes us all better to have access to better data and to not be hand-making charts instead of doing analysis.  I don’t consider programming to be intellectual waste on my part.

However, where we have gone with the programming I do consider waste.  We are a team of black belts, all of us trained and certified.  The program, like at many organizations, has been struggling and people don’t often know what to do with us.  So, often instead of problems we get requests for solutions.

People treat us as project managers and report jockeys.  They give us a thing and say “go implement this” or “report on this.”  Who hires black belts to be do-ers?  You can get a project manager anywhere.  You can get a reporting team anywhere.  This is not of value to the company to use us as shadow staff!  You have people whose full time job it is to do this!  It drives me nuts when it happens and it culminated in the ultimate insult.

Having already lowered ourselves to producing reports rather than MBFs, we had built up a set of queries which someone else wants to replicate.  Honestly, I was glad for that because if they were reporting it, we could stop.  And then we could return to something with more value.  But it wasn’t the queries themselves that made me mad, it was a passing comment about why we knew so much about the data.  “Are you guys in some sort of reporting team?”

AAAAGH!!!  I nearly lost it.  I could feel my face turn flush, so it was good that I was on the phone.  Some sort of reporting team?  Reporting team!?!?  How did we get here?  We got here because in order to do our jobs we needed to understand our measurement systems better.  People made the bad jump from “hey, they understand the transaction database” to “hey, they could produce a report faster than me making a request through normal channels.”

I will admit, as I said, a capability can be both a blessing and a curse.  And that’s what happened when I shared my development capability with the team.  The team took advantage of it and in turn shared that capability with people who just needed some reporting.  We diluted our own value by not recognizing the intellectual waste we were creating.

Bring us problems to solve, not things to do.  There are lots of less trained people who can do things for you and at least you’ll pay a fair market rate for the work too.


Overcomplicating Things

January 21, 2009

How overjoyed I was when I was approached by a senior manager and his team to help develop a true MBF.  They were interested in the end to end quality of the the products they test.

Having gone out to their customers and gotten VOC, their customers told them what quality meant to them.  Though there are many things one might consider quality code - maintainability, stability, etc. - our customers overwhelmingly told us one thing: NO DEFECTS!

Great, we knew what our customers didn’t want.  Now, what’s the opportunity to have a defect.  Oh… well… um… unfortunately that question hasn’t been well answered by the software industry.

I do want to digress and point you towards a presentation I saw given by Gary Gack today regarding measuring software productivity which I thought was very interesting.  Productivity and quality both need that same “opportunity” measure to make them meaningful.  A good measure of the quality of a product is (defects / opportunities) and a good measure of productivity is (effort / opportunity).  In this case, opportunity might mean lines of code or function points or who knows what.  Mr. Gack presented the standard case against these types of measures – lots of challenges abound.  He argues that instead of measuring the opportunity (size of the work) measure how leanly you do the work.   Of course, that fixes productivity but does nothing for quality of code.  We’ll come back to that in my next post perhaps.

Anyway, long digression… the problem was just like everyone else, this QA department didn’t know what the opportunity was.  So they chose one.  GASP!  Horrors, you say!?!?  I disagree for one simple reason.  Rather than avoiding the measurement because they didn’t know what exactly an “opportunity for a defect” might be, they chose to acknowledge that whatever operational definition they came up with would be imperfect AND they would work to counteract its imperfections with additional measures to balance the scorecard.

By balancing, in this case, I don’t mean having measures for cost, quality and speed, but instead having measures that counteract or watch over the potential gaming that could be done to the main measurement.

Up to this point, I am a happy boy.  Nay, I am practically shedding tears of joy.  Never before in this organization have I seen some senior leader actually take this kind of initiative to try and at least get us in the ballpark when it came to where our quality stood.

So, let’s dissect the proposed measurement and see where it went awry.  I’m happy to say up front that they performed a proper analysis of where they ended up and corrected it before moving on.

So, back to the original question… what is the opportunity for a defect?  Most people choose lines of code (LOC) or function points.  This team started with something they felt confident they could measure – test cases.

I know, I know.  Test cases is no measure of opportunity… or is it?  In our world, it well may be.  We perform testing looking for good coverage over all the code created, so QA examines the requirements, writes scenarios and test cases from those requirements and executes them all.  Since QA does not selectively not test (ie perform risk based testing), the number of test cases written probably correlates pretty well to the amount of opportunity.  Yes, it is true, a test case is an opportunity to find a defect, not an opportunity to create a defect.  It’s a subtle distinction, but important.

Of course, the team took it further.  What about the complexity of the project and what about the amount of people involved?  Aren’t these things important indicators of how much opportunity there is for a defect as well?  After all, just like you probably initially reacted to it, test cases seems like a bad way to measure opportunity.  So they added those in to create their opportunity measure called “Weighted Test Cases (WTC)”.  WTC is Test Cases * Project Size * Project Complexity.  Ignore for a moment how size and complexity are figured out.  Ultimately, their output measure of quality for a project would be (Defects / WTC).

Here’s my sanity check for “do I have the right opportunity.”  Do a correlation test.  That’s the whole thinking behind having the opportunity as part of your measure in the first place, right?  If I have more opportunities, then having more defects doesn’t necessarily mean I’m doing worse.  Less opportunities, less defects.

After they got their data together for their first go at it, we did some analysis.  First we looked at the WTC denominator of this equation.  Test Cases * Size * Complexity.  Hmm, size… test cases… might these two things be related?  I mean, after all, if you have more test cases you probably need to exert more effort to run all those tests.

Indeed they were, and quite strongly.  Ok, so either size or test cases doesn’t belong in the denominator.  Since test cases was our starting point, we dropped size.  And for good measure we dropped complexity as well.  Now we had Defects / Test Case.  This seemed better.

Next, we went after the numerator.  All all defects created equal?  No, I’m afraid not.  Some defects have a high severity, some a medium and some a low.  So maybe weighted defects (defect * severity) is closer to what we want instead?

Sure enough, when we compared the correlations of Defects to Test Cases and Weighted Defects to Test Cases, the Weighted Defects came out with a stronger relationship.  Interesting!  So it appears that more test cases doesn’t just mean that you’ll get more defects, but you’ll also get more defects of greater severity.  This makes sense intuitively.  Bigger projects are more complicated and have more chance to make big mistakes.  (By the way, that same Gary Gack presentation alluded to something similar in productivity.  Larger projects have lower productivity on average than smaller ones.  There’s a nonlinear relationship between the two.)

Finally, we arrived at a simpler solution. From Defects / (Test Cases * Complexity * Size) to Weighted Defects / Test Cases.

And lastly, the team added some countermeasures.  Why?  Well, “test cases” is a proxy for opportunity, but only if the testing process remains stable.  If testing gets better (ie, more defects found per test case) then the quality of the code looks worse even though it may not be.  If testing gets worse (ie, “hey, we can make the denominator bigger if we write lots of teensy test cases”) then quality would look artificially better.  So, the team added another measure – defect containment rate (DCR) – a Capers Jones favorite.  By having DCR alongside the quality, if containment went up in proportion to the change in quality, then we’d know that quality remained the same while the appraisal process (testing) had improved.

And on the other side we decided to measure effort / test case executed.  Since smaller test cases would take less effort to execute, if we saw a drop off in the effort per case we would know that people were trying to increase the denominator of our main measurement artificially.

Alas, those last two paragraphs have little to do with my takeaway lesson.  Ready for this one?  Start simple.  Even though you can imagine why something so basic as “test cases” isn’t a good proxy for the opportunity to create a defect, you could be wrong.


What’s your place in the world?

June 25, 2008

Comparatively speaking, of course, do you know where your organization stands?  How about on your software development process?

I just got out of a meeting today where someone was fretting over the fact that even if we had data that nobody ever believes it.  The data cannot be trusted, they all say.  It isn’t accurate, they say.  What they’re saying is there’s noise in the data and therefore the data is useless.

Who cares!?!?  I’ve already written about data and the presence of noise and the fact that noise is not a killer.  The takeaway from data is not always about your absolute place, but more about where you relatively stand.  If today we make decisions in the absence of data, then the presence of (relatively good) data should help us make better decisions.

It is irrelevant if the data is absolutely perfect.  If the amount of noise does not overwhelm the signal, then data can still provide us a direction and a sense of where we stand in the world relatively speaking.

Relatively speaking is what it’s all about – am I conclusively better than I was yesterday?  Am I worse than I was yesterday – should I take action?  How do I fare compared to my competitors – will I lose in this marketplace if I don’t change?  I don’t actually care if I’m the 10th or 11th best in the world based on the data.  All I want to know is if I am discernibly different from the guy who’s number 1.  If so, take action.  If you can’t tell if there’s a difference, frustrating as it may be to sit on your hands, you probably shouldn’t take action.

But it’s about relativity, not absolutes.  In the inverse, if you are the number one company out there, but you can tell your performance apart from number two, you probably want to start thinking about how to make the gap apparent.  Have you actually lost your first place ranking?  Maybe, maybe not.  Who cares!?!  When you can’t tell anymore, it’s time to make it so you can tell.

Think relatively speaking.  Think about the direction the data points you, not exactly where you are standing.


A disturbing lack of integrity

April 2, 2008

Of all the skills that a Six Sigma person needs to be successful, I’d say two things stand out.  The first is the ability to analyze data for statistically (and practically) significant patterns.  The other skill is change management.

So, when I observed a person using the first skill to manipulate the data to enable the second skill I saw a person treading a very dangerous line.  A healthy foundation of change management is the trust that you have built with the folks you are trying to influence.  You use data to make your case, build trust in your analysis and show that you are not being manipulative.  And sadly, as I’ve just observed, sometimes (mis)use that trust to hide the data that doesn’t fit your case.

Where I work, Six Sigma is still a fairly new concept.  If an individual in the Six Sigma organization misuses the trust we’ve built up, s/he can do more damage to that trust in a single act of subterfuge than we can hope to rebuild with months or years of honest, forthcoming behavior. Lack of integrity on the part of even one individual in your organization has consequences that reach far beyond the individual.

Here’s what I saw today.  As a team, we were working towards a presentation to a group of senior managers.  Each project manager was going to have to present his/her project, goals and results on a one-page slide and then speak to it for just a few minutes.  This was purely information sharing.

One of the charts showed a stunning rise in First Pass Yield of a given type of transaction.  It looked kind of like this:

bad-statistics.jpg

What a great story!  First Pass Yield from the 30-40% range to 70%!  Who wouldn’t be excited?  Well, who wouldn’t be excited except that this chart is truncated.  Here’s what the data looked like a few months after May…

good-statistics.jpg

In June through September, the numbers were better than they were in January through April, but only marginally so.  I haven’t calculated control limits, I doubt excepting for May that it’d even show special cause variation.  It wasn’t nearly the great success story.  Now, one might chalk this oversight up to optimism.  We could have simply claimed success too soon in May.  But we weren’t preparing this presentation in May of 2007.  We were preparing it today, nearly one year later!  We knew that the gains in May didn’t hold; we had the data in our hands which showed it.  And yet, one of my peers was prepared to show a slide which conveniently cut off the data before it went south.

What’s worse, is that we make all our data, including this disastrous collapse in first pass yield, available online.  It wouldn’t take much for one marginally smart individual to connect the two and say “hey, that isn’t what you showed us in that presentation!”  And, if that realization gets back to the right (or wrong, as it were) person(s) then our entire group’s credibility is damaged.

It may be convenient to lie now because it gets you your Black Belt certification, but these kinds of actions serve only to undermine the longevity of not just your, but all our careers.  Change Management, and the required trust to accomplish it, cannot be sacrificed for short term gains or personal reward.  Once the trust is undone, you can be sure that nothing else you try will ever make a difference because nobody will allow you to implement a change again.

It takes an enormous amount of integrity to do the right thing, but it is better to admit that more has to be done to meet your customer’s CTQ than to sweep it under the rug.  Your lack of integrity will catch up with you.


Not using run charts

April 1, 2008

Run charts are a great thing, don’t get me wrong, but I’m starting to wonder if they have as much of a place in the monitoring of software development processes.

It stems from two issues:

  1. We don’t produce a lot of widgets
  2. The widgets we do produce take varying lengths of time to produce.

So we’ll pretend I’ve got the following projects, numbered 1 through 6.

Sample Project Timelines

Let’s say I make a process change in Q3 of 2007 to improve the requirements process.  My research indicates that it should result in a reduction of defects found in production at the end of each project.  Since requirements happens very early on in the project lifecycle, projects 1, 2, 3, and most likely 4 will have operated under the old requirements process.  If I were using a run chart to display defects found in production, it’d take until the end of Q2 2008 before all the “old process” projects made it out into production.  If I lumped all the production defects in a run chart, the old process projects would continue to skew the production incident data for at least another month or two beyond their install dates.  For a period of time, the old process projects could create sufficient noise so that I could not detect a difference.

So, I could separate the projects out into two pools – old process vs. new process run charts.  But the new process only has two projects under its belt by the end of Q3 2008.  Data would be so sparse that it’s meaningless. 

Displaying process data by months when the process doesn’t fit nicely into a fixed period just doesn’t make sense to me.  Even more so when it comes to the Xs.  If I’m making hundreds or thousands of widgets, and a few go bad because an X moves out of control resulting in defectives, I can take countermeasures and save the vast majority of the widgets.  If an X moves out of control for a project and it isn’t recognized until the run chart is calculated some number of weeks later, you could be out hundreds of thousands or millions of dollars trying to correct it.

Process management in a software development world collides with project management.  If we understand what process variation will result in projects going badly, then we should observe and act on a project by project basis, not some arbitrary monthly aggregation in a run chart.

I’m on the hunt to devise an MBF which can adequately represent individual projects that show characteristics of being in trouble.


If you can’t distort the data, just don’t look at it

March 26, 2008

A sad reminder of the state of affairs comes from Curious Cat Management when it comes to conflicts of interest between the data and one’s own goals.  It is challenging to act with integrity when you are able to manipulate the facts to meet your own ends.  Just yesterday I ran into a baffling extension of distorting the data:  if you have data which tells a bad story about your own organization, hide it!

In my case, I was discussing a proposed MBF with a superior of mine.  I think like many organizations that deliver services we rely on three key areas of measurement:  Speed of delivery, cost of delivery and quality of delivery.  For some reason, we rarely measure the functionality of the delivery, but that’s another story…

Ignoring speed for the moment, there exists a paradox between the cost and quality aspects of delivery.  It is generally thought that driving up quality will result in an increase in cost as well.  The innovation in a process is breaking this supposed relationship – drive up quality and drive down cost.

For that reason, I proposed two measures as the basis for our MBF.  The first we already use today in some form:  defects per thousand lines of code (KLOC).  My proposal for the second was cost per KLOC.  I’m going to put aside the concerns about whether KLOC is a valid way to measure productivity.  While function point analysis resolves many of concerns regarding lines of code metrics, our organization lacks the maturity to think in those terms.  Fortunately, Capers Jones gives us a nice table in Applied Software Measurement which allows us to compare differing languages on a level plane.

Regardless, we don’t measure productivity at all today.  To me, it’s just obvious – any measurement of productivity is better than having no measurement at all.  Unless of course, you are heading up the organization whose productivity is abysmal.  And that is what I ran into the other day.  Why was my proposed MBF nixed?  Had I stumbled onto something that my superior did not want to share?

I’m still seething that someone would turn a blind eye to a productivity problem for fear of how it might reflect on them.  People are not stupid and even if you choose to never measure something then you lack any basis to dispel myths about your productivity or compare it to your peers.  Sure, it sucks to be the worst, but this is also your motivation to not be in last place.  The presence of data (even in the worst MBFs) drives action.  The lack of data allows you to put your head back in the sand and pretend it isn’t there.  It is there, but if it helps you sleep better at night to ignore the problem, go right ahead.  The problem will be waiting for you when you come back to work the next day as long as you refuse to act on it.


No measurement, no project

March 24, 2008

It seems to be a common theme in dealing with improvements in the Systems world that we never have any data.  There’s a bunch of reasons why not:

  1. People don’t think to measure things.  I think this is fairly common until you have a problem and then usually people only measure the output measures of the process.
  2. Systems produces fairly few widgets.  Unlike a true factory, I’ve worked in organizations that put out 30-40 new features a year or less.  With that in mind, I’d have to wait at least a year to know if a change I made is making any difference, maybe even two years – one year for the baseline data and another for the changed process data.

No business is going to wait 2 years to find out if a change worked.  And darn it all, this lack of data is getting in our way of progress!

So, when one of my GBs came to me recently and said “someone told me I won’t be able to get any data for my project, can I just estimate the baseline?” I responded with “if you don’t have measurements, you don’t have a project.”

If humans can detect the presence and figure out the size of planets orbiting stars hundreds or thousands of light years away based on how the star wobbles in response, surely we can detect the presence of a defect created within the group of cubicles we live in!  Even if we cannot observe the defect being created directly it must have some impact on the world that we can examine.

For example, let’s take production defects.  When something goes wrong in production, a very unhappy customer tells one of our phone reps about it.  In turn, the phone rep logs the defect into some defect tracking system with a bare minimum of information. Eventually, that defect makes it into the hands of a maintenance person who fixes the defect and closes the issue in the tracking system.  No where along the line is that production defect ever tied back to the project that caused it.

As a result, we are incapable of directly observing the impact of projects on production.  We know that projects cause defects in production, but since there are bunches of projects making changes to production every month it is impossible to tell which project caused what defect.

We can, however, tie pre-production defects back to the projects that caused them.  When our testers find a defect during testing, they log a defect which includes the project they were testing.  I’ve seen two models of defect detection for us:

Defect Detection Patterns

In pattern A, for every defect we find, more simply get by us.  Bigger projects simply produce more bugs than smaller projects.  We find some number of bugs, but never all of them.  In pattern B, for every defect we find, less gets into production.  This is the pattern you’d expect from an effective QA organization. What’s great about either pattern is that there is a relationship between what we find in production and what we find in testing.

Defect Containment

As a result of the inability to tie prod defects back to the project, we report a number called defect containment each month (like the chart above).  This is a fairly commonly reported output measure.  The problem is how we calculate it.  Containment is N / (N+P) where N is non-production defects opened in a given month and P is production defects opened in the same month.  The thing is, the code we are currently testing in June is not the code that is in production at the time.  Relating the two values together is meaningless.  If you do a correlation test between the non prod and prod defects in the same month you’ll find it has essentially no relationship.

But this is what happens when you can’t tie a defect back to the project that caused it.  So how do I know that we have two patterns of defect detection?  Indirect observation, that’s how! 

  1. I know that the code is being tested in June. 
  2. I know that 100 defects were reported against all projects being tested in June. 
  3. I know that the code being tested in June will go live in August. 

Therefore, I can try to find a relationship between the June testing defects and August production defects.  When I do this, I find a very strong correlation between non-production defects and prod defects.  Sure, it’s not perfect.  Some defects may not be found until September, October or even later.  But the reality is that most of our production defects crop up relatively quickly.

Getting back to my Green Belt’s conundrum, you can see how I would say there was no project if there were no measurements.  Just because you can’t put your finger right on the data doesn’t mean there isn’t a way of seeing it indirectly.


“Management by Fact” not “Reaction by Fact”

March 6, 2008

NOTE:  I reworked this post based upon more recent similar experiences.  Same concept, different story to tell it. 

I ran into a disturbing activity this afternoon at work.  We were discussing an MBF for a testing processes and looking at the trailing indicator – incidents in production.  The number of incidents in production rose last month “unexpectedly.”   By unexpectedly, I mean the process owner and customer did not like the number.  Naturally, I wanted to go look at our leading indicators to see what changed.

Our process owner did not.  He wanted to get the data for each and every ticket and look at it and figure out a root cause and then make process changes to prevent each of those kinds of defects from happening again.  Ack!  Where did we go wrong explaining this MBF?

The point of having an MBF is understanding the critical few levers that you have to change the process performance.  If you see a leading indicator going in the wrong direction, you can rest assured (if you’ve done your job right) that the output result will go south as well.  But did our process owner look at the leading indicators or the countermeasures plan to see what to do?  No, he wanted to look at the root cause each ticket and see how to prevent that specific issue from occurring again.

Looking at each ticket is no way to go about managing your process!  Why not?  This is reacting, not managing.  Leading indicators warn you of something to come and if you aren’t seeing it until after it happened, you aren’t looking. 

Let’s say I made candy for a living.  Each month I might measure the number of defective packages of candy.  So, my output measure would be “defective packages of candy per month.”  Then, were it a real MBF, I’d have leading indicators that would be affecting my good candy package yield.  Let’s say, sugar quality, time in the flavoring machine and number of rats scurrying around the factory floor.  If one of these leading indicators were to go the wrong direction, say an increase in the number of rats in the factory, I could conclude that I was likely to have a lot of defective (in this case, probably half eaten) packages of candy.

But rather than do that, we see MBFs where the number of defective packages of candy is the trailing indicator (this is still good) and that our “leading indicator” is defective packages of candy by root cause.  It’s the same data as the first chart, except stratified by candy packages that are “too sour”, “flavorless”, or “half-eaten” and maybe some combination of the three!

Sure, I now know what went wrong with each package of candy, but I haven’t done anything to prevent it.  I could have taken action had I known that an increase in rats was going to cause more candy packages to get eaten.  And I could have done it BEFORE the candy got eaten.  Now all I can do is look at my root cause data and say “oh, maybe we should put out some rat traps” or maybe since we failed to keep the rat population in check in the first place there is nothing left to do but close up shop.  The rats are running the factory now.

The next time you go to make an MBF, don’t provide endless stratifications of the trailing indicator just because some uninformed user thinks that’s how you manage a process.  Educate them.  It’s called “Management by Fact” not “Reaction by Fact” for a reason.


Getting the noise out of measurement

February 13, 2008

I discovered that we spend too much time some days trying to figure out how to slice up the data we want to measure.  The conversations about it make my head hurt!  I dread people coming to my desk to talk about the MBF format or data collection plans.  Conversations go something like this: 

Someone else: “We’d like to measure the old requirements process versus the new requirements process.” 

Me: “Ok, well, we’ll get the cost data from our timekeeping system and … “

Someone else: “No wait, we can’t do that, the data is too noisy and we’ll never be able to tell.  I want you to get the data for only this list of projects and only if they have these exact billing codes under them and only if they were started after July and only if the moon is full and …”

Me: [douses self with gasoline a'la the poor passenger stranded next to Ted Striker in Airplane]

Why do we do this?!?  First off, who says the data is too noisy to be measured?  Yes, there is noise in the data, there likely always will be.  The noise in this example is introduced by people who don’t know how to bill their time properly, people who know how to bill their time properly but are too lazy to switch to a new billing code, people who are deliberately not billing because they’re trying to fudge the budget and the list goes on.  But all that said, across the vast multitude of projects we run, some amount of this is always going on.

Of course, and then there’s potential measurement system noise.  Were I measuring the size of pieces of paper created by a cutting machine with a ruler, I might misread the ruler or be lazy and approximate.  So in addition to the paper cutting machine not producing consistently sized pieces of paper (part variation), I might be creating noise just by examining the paper.  Fortunately, when extracting data from a time keeping system, this isn’t happening.  The data is digital and the measurement system very reproducible.  No matter how many times I pull the same data set from the timekeeping system, I’m going to get the same result.  It’s also highly repeatable.  No matter who runs the job to pull the data you’ll still get the same result.

Secondly, who says your selection criteria is making things better?  In our efforts to get the perfect slice of data out of the system we forget the purpose of measuring in the first place.  We want to know if one population is different from another and that’s it.  Now certainly, if there’s enough noise in the data you will never be able to tell if the two populations are different.  But you haven’t really figured that out have we?  We just assume the data is bad and begin to over specify the sample population.  This is just as bad if not worse.  Random sampling is key to making assumptions about the larger population.  By adding in all these conditions on how we sample the data we are taking away randomness.  All the candidates are being cleaned out of the pool by us trying to find just the right candidates to prove our point.

And going forward, when you measure your new process, you will be unable to do this kind of careful selection.  All the new process data points will continue to have noise created by normal people behavior, like being lazy about timekeeping.  And then what exactly are you comparing?  A highly sanitized sample that isn’t representative of the population vs. a representative sample.  Even if you found a statistically significant difference in the two samples it’d be suspect at best.

In software development we often measure after the fact because our “widget creation process” turns out relatively few new units each year.  In order to get the required 20-30 minimum sample size we have to go back to old projects.  Since we didn’t plan on measuring these old projects, chances are there’s noise in the data from that project that we’re measuring.  Chances also are that the noise continues in the new projects we are going to measure after the change.  Rather than arbitrarily filter the old data to get something you think is noise-free, simply compare the new process to the old process, noise and all.  As long as the noise isn’t greater than the signal, you will still be able to detect a difference if you made one.  If after looking you can’t detect it, then start talking about noise in the data.  Regardless, it’s not like you are going to be able to go back and sanitize the data anyway, so if you can’t prove a change with the noise included, you probably can’t prove a change at all.