Estimation from history

October 1, 2009

If we don’t study the past we are doomed to repeat it, any historian (or history buff) will tell you. And yet, when it comes to estimating, if we DO study the past then we are doomed to repeat it.

Of all the potential methods of doing estimation, I thought that building up knowledge about past performances would be one of the most useful for having future estimates which are accurate. Ideally, as enough time passes you begin to see repeating patterns in your projects and can leverage that knowledge to assemble new estimates that are based on the actual prior experiences.

In a journey of trying to continuously improve, what could be worse? Using your historical performances to predict the future is valuable if you are happy with your past performance. The market is never happy with your past performance. Even if that performance is good, your competition will be looking to best you. If you build estimates from your past then you are going to be repeating your past. Parkinson’s law will take care of that for you, even if you try to become more productive.

It may seem like a subtle tweak to using a historical database to create estimates, but I think it is an important one… you can’t just estimate based on past performance. You can use that knowledge, but then you have to be prepared to set a goal for your project that is 10% (or some other percent) better than the last time you did it. You must always be striving for better. Whether you achieve it or not, the culture must be that no matter how good what you have done in the past is, we must always be looking for more. Process improvement is not an event, it is a journey.


Prove me wrong

February 7, 2009

I’m not much of a reader honestly.  I know that seems strange, but I like to take information in small quantities, like articles, short chapters, etc.  Works that need me to read from beginning to distant end to get the whole story don’t keep my attention.  So it’s a bit odd that I would be reading “Fooled by Randomness” recently, but it was given to me by a coworker and I felt some obligation to read it.  So far, so good, actually.  It’s a little less data than I like, in fact holding what might amount to a handful of truths wrapped by long, long prose.

Regardless, as I read through I came upon “the black swan” which finally allowed me to tie together a blog entry that I’ve been hanging onto for a long, long time.

So, here’s the idea of the black swan.  “I have never seen a black swan, therefore no black swans exist.”  This is a very difficult statement.  On one hand, it might be true, for through sampling of the population of swans randomly, you’d probably not see one.  They do in fact exist, in Australia, but even with a really large sample, if you never took a swan from down under, you might conclude that no black swans exist.

The author’s point is that the converse statement is much easier.  If you can find even one black swan you can confidently make the statement “not all swans are white.”  Say perhaps you sampled as few a 2 swans, and one was white and one was black.  Tada, your point has been proved.  On the other hand, the absoluteness of the first statement, no matter how large your sample size, until you can observe the entire population of all swans that have ever or will ever exist, cannot be proved.

Anyway, hearkening way back to the Fixed Price Shop entry we encounter a similar story.  I had concluded with a high degree of confidence that all estimates fell within a very small range of results and therefore we were conclusively a fixed price shop.

Karl Popper coined the term falsifability.  It is the ability to prove something false.  Scientific research is falsifable; religion is not.  That’s not to say that religion is correct, but because it is untestable is cannot be proved nor disproved.  I’m probably paraphrasing too much, but essentially the point of falsifability is that we take a hypothesis and attempt to beat it to death to prove ourselves wrong.  By doing so, once we have excluded the alternative hypothesis which make our “discovery” a non-event, we can finally call what we have a discovery.

So here we were with me claiming to be a fixed price shop.  All my observational data said so!  That’s when we needed to exclude the other possibilities.  See, being a fixed price development shop meant that regardless of the work requested that we’d produce the same estimate anyway.

So, what would that mean?  Well, it’d have to mean that there was no correlation between the requirements and the price paid for the work.  The first issue is that statistics cannot tell you that there is a non-event it can simply tell you that there isn’t an event.  It’s subtle, I know, but the point is that just because you can’t find statistical evidence of a relationship doesn’t mean the relationship doesn’t exist.  But let’s assume for a minute that because I couldn’t find a relationship that it meant something.  As a non-event option, however, it could mean that all projects were simply the same size.  If people always requested about the same amount of work, then it’d make sense that the projects cost about the same.

I admit, it is hard to figure out how big the requirements are, but I just did something simple.  I took a sample of projects and counted the number of rows of requirements in the documents.  It was very convenient that most of our requirements documents are written in a nice list format.

Indeed, I could find no correlation between the cost of my sample projects and the number of requirements.  It sure seemed that the number of requirements varied but the cost did not.  The alternative hypothesis that all requests were the same size was defeated, right?  Wrong!

What if I found no relationship between the two not because there was no relationship but because my method of counting was no good.  Maybe all work really was the same size and the way I sized the work was bad?  How could we prove that?

Well, we knew that the method for estimating was essentially: read the requirements, estimate the development effort, tack on all the other surrounding resources.  In fact, this is exactly why we believed that the estimates were fixed price.  There was a variable price to the project (it is the development work) but it was dwarfed by all the garbage that people put around it.  The end result could vary very little because the signal was essentially drowned out by the noise.

Still, there was a potential non-event here with the prior research.  We might have a black swan.  We knew that the sample projects requirements bore no relation to the costs of the sample projects, but we’d never found a case where it did.

Well, I couldn’t wait to observe all possible projects that ever were or ever will be, so we had to figure this out somehow.  Recalling that the estimation methodology, in theory, estimated the development portion of the work off the requirements, if we could show that requirements to development effort were correlated but that requirements to overall effort weren’t correlated, that would show that we were correctly observing the process.

So, we tested the same counts of requirements against just the development portion of the effort.  And wouldn’t you know it, the counts of requirements were highly correlated to cost of development!

It’s a long way round to get to a simple point.  Once you find that you have proof of something, it is your responsibility to posit the non-events that could undo your discovery.  Then, test the non-events to see if you’ve just found something mundane or something truly interesting.  And then, if necessary, test the non-events of those non-events.  In my story, though I left out some paths for brevity, it goes as follows:

Statement: All projects cost approximately the same because we don’t estimate the work, we just put in arbitrary efforts. 

Possibility: But, what if really what is happening is that all work requested is just the same size? 

Statement:  But we can show that the number of requirements varies from project to project and has no correlation to the cost.

Possibility:  But what if really what is happening is that your counting of requirements is no good?

Statement:  But we can show that the counts of requirements are highly correlated to the development effort, but not to the overall effort.

It’s my job to prove me wrong as part of a thorough analysis.  It’s my job to explore the possibility that I have found nothing, to show that my claims are potentially falsifable.  And then to show that I can refute those claims of non-events.  It’s my job to cut down my own research until the alternative explanations have been pruned away.

It’s your job as well.  Try to prove yourself wrong, or you can be sure that someone else will.


Unintentionally fixed price software development

February 10, 2008

I’ve been working on a project at work to improve our client pricing process.  Our business partners have long had complaints about how well (or not well, as it were) that Systems does these estimates. Some time a few months back we held a 2 day offsite to get our systems folks together with our business folks and decide on a new way to do estimates to improve the results. I think the results were less than stellar and it wasn’t until late last week that I understood why. We do three pricing estimates during any project: a high-level estimate done based on “back of the napkin” requirements, a more detailed estimate once we have requirements and a final estimate once we’ve completed the design.

Everyone got together in this offsite and decided it was all about communication and that we should set up a central team to coordinate estimates and get the right people in the room so that we got a good estimate.  At the same time, I was assigned to head up the tools team work – a group of folks who would focus entirely on what the right tool (excel spreadsheet, ms project, something else…) would be the way we would do and present estimates.  While doing my research for the tools, I met with all kinds of business and systems folks to understand what outputs they needed from an estimate.  I also learned some very interesting things along the way.

For one, the business was happy with the project actuals compared to the final estimate we did.  In fact, I pulled data for some 200+ projects and found that most of our projects fell in the +/- 10% range.  That is, our actuals were almost always within 10% of our estimates.  I think I’ve written about it before, but I should note that being off by more than 10% was cause to consider a project’s status “red” – a seriously out of control project.  It was more than a bit suspicious that so few projects fell outside that range.

Of course, if our estimates were so close to our actuals all the time, what exactly was the business unhappy about with our estimates?  It turns out they are unhappy with the quality of our early estimates.  It seems that the business wants to make major yearly decisions about which projects to run and if they ask us for an early estimate then they don’t expect subsequent estimates to double, triple or quadruple the cost.  In fact, more than one person I talked to said early estimates had to be 80% accurate.  One of these very respectable business folks used to work for a fixed price company so he knew that kind of estimate quality was possible.

We continued our research with this new information in hand and decided that we needed to work specifically on getting a high-quality early price.  In the absence of requirements, we believed that a parametric estimating model would be the best way to get a good early estimate.  Little did we know how easy it was going to be…

What we did was we built up a set of about 45 questions, mostly yes/no questions, that a business person could answer about a project.  Imagine a checklist that has items like “is product X being modified?”,  ”how many new web pages are being created?” and so on.

Then we dug up a sample of old projects and filled out the questionnaires for each project. Finally, we added the final budget to each project’s data.  From that we were going to build a regression equation.  A funny thing happened while building the regression equation.  I use Minitab to do my statistical work and I was having trouble getting it to build a regression equation using all 45 predictors.  In a complete fluke, I selected I small subset of predictors and tried to get an equation just to figure out why Minitab was spitting out an error.  The craziest thing happened!

Using just these 6 predictors I was about to predict the final cost of a project with > 95% accuracy! What was even crazier were the predictors!  They weren’t something complicated like “how many web pages are being modified” or “what is the approximate transaction volume.” Rather, I could get this super accuracy by answering these 6 questions:

  1. Is Product A being modified?
  2. Is Product B being modified?
  3. Is Product C being modified?
  4. Is Product D being modified?
  5. Is Product E being modified?
  6. Is Product F being modified?

Where I work, we offer a suite of products to our clients, so the products have inter-relationships.  Any given project/enhancement we do may affect one or more of the products we offer.  It’s quite common for an enhancement to affect a few of the products at once. Despite all the variety of enhancements we do, all the lines of code (at least one product has 750,000+ lines of code), all the different people and features pretty much every project is one price!

All this effort was going into getting the right teams of folks together to do an estimate and the reality is that I can tell you what the end result is going to be without asking a single person. How did we end up here?  How was it that no matter how complex or how easy a change was it cost the same thing?  When did we become a fixed price shop?

I have no proof, but I believe years of trying to improve our estimate quality got us here. Each team in their own way attempted to figure out how to make estimates quick, easy and accurate.  One of the teams I used to work for came up with staffing models.  The idea was that they estimated just the development effort and then extrapolated the rest of the project cost based upon that number.  So, for example, if you had between 1-2.4 person months (PM) of development, you chose staffing model #1.  Each model told you how much systems analyst, how much project manager, how much tech lead and so on that you needed for a coding effort of that size.  Simply take the base model, tack on the development effort, tweak slightly and ta-da… you have an estimate.  It turns out that most of the projects they do all fall into the same range (2.5 – 5PM) and so the same staffing model gets used over and over and over again.  Essentially, work is expanding to fill the time allotted.  The reason we’re a fixed price shop is that in order to never run over budget we’ve simply grown our estimates to the point where even if everything goes horribly wrong we can still get the project done.  Does it seem right to you that 1PM of coding would require the same percentage of effort from an analyst as 2.5PM of coding?  Is that even a reasonable differentiator?  Sometimes it’s just a lot of coding and requires almost no analysis (like repetitive code changes) and sometimes it’s not much coding at all but needs a lot of research.

What is it the business is really saying when they say they don’t like our estimates?  It isn’t that the estimate isn’t reliable, it is that we’re simply outrageously expensive.  At least we now know that there’s lots of room for improvement. 


One long rant about estimating

November 3, 2007

I finally figured out something that I might want to actually write about – process.  By process, I mean the software development process and its related parts.  I realized the other day that I see a lot of nonsense or badly applied good ideas, and I feel like I should say something about it.  So hopefully I will take my ideas, my “truths” about software development and if nothing else inspire you to respond with a long rant yourself.

My first target – estimating.  In particular, one graphic that shows up regarding estimates over and over and over again that I think is just downright wrong.  You’ve assuredly seen the familiar “funnel” that people use to describe the process by which “good” estimating occurs.  Early on, at Estimate #1, there might be a margin of error of plus or minus 4 times.  So, if you estimated something would cost $100, it might cost as much as $400 or as little as $25.  You just don’t know because you have very basic software requirements at this point. 

The Traditional Estimate

Now don’t get me wrong, I agree with the idea of an estimate having both a high and low range and that it gets more accurate as you know more.  It’s the graphic itself that bothers me.  Here’s why…

When someone estimates and knows more, the next estimate is bound to go up or down.  It isn’t going to stay at $100 most likely.  Specifically, I’m bothered by the worst case scenario.  Let’s say we make our first estimate of $100.  At worst, it could be $400.  We learn more by the time we get to the next estimate and therefore we estimate now at the high end of our prior estimate – we estimate $400.  Ok, ouch!  Our estimate just rose 4x what we told the customer at our last estimate point.  But it is going to get worse!  If we continue this trend, that $400 would turn into $800 and so on…

The Worst Case Estimate

By the time it’s all over, we’re at an estimate of $1200 – and if the worst case plays out, our final cost will be $1500!  15 times what we originally told the customer.  Sad thing is, I’ve seen projects that do exactly this. 

Now I will accept there is another way to interpret the funnel.  Let’s say that when you make your first estimate that it in fact defines the correct upper boundary.  That is, if you estimate $100, it will never exceed $400.  So the worst case path we could follow would cap the upper limit always at $400.  It would look like this:

Another Bad Case Estimate

Now I’ll agree that this isn’t as bad as my prior scenario, but essentially what happens is that you slowly converge on the maximum defined by the original estimate.  This is still a very unhappy customer – a final estimate that is 3.6 times what you originally told the customer.  No wonder managers get mad at software developers!  This funnel we all see isn’t helping anyone’s expectations of what a series of ever-refined estimates is going to really do.

Do I have an answer?  No, but maybe in a subsequent rant I’ll come up with one.  For now, I think you can save yourself a lot of trouble by either a) getting a lot better at doing early estimates or b) not giving an estimate until you know enough to give a good one.