Are function points measuring something unique?

Posted on April 5, 2010

0


One of the criticisms that I’ve heard of function points is that they are highly correlated to lines of code, or at least have appeared so in studies done. To some advocates of function points, this is disheartening because function points (FP) were supposed to be measuring a different construct from lines of code (LOC). That is, FP measured the size of the problem while LOC was measuring the size of the solution.

Indeed, in theory LOC works just fine to measure the size of software, but they work terribly in practice. Why? Because as soon as you tell a developer that their productivity will be measured in lines of code written, they simply write more (and less efficient) lines of code. Suddenly being thoughtful about how to write code is not nearly as important as producing a lot of it.

FP, on the other hand, do not seem to have this issue. A developer can’t simply create more FP on demand. The FP count is bound by the size of the request; can be counted from the requirements and thus is freed from the influence of the developer. So, why, in studies then, are FP and LOC so highly correlated? Why would this happen?

As I was driving home (a common time for me to think, I’m sure you’ve noticed), a possible answer came to me. FP and LOC do measure the same thing in a theoretical or uninfluenced world. That is, if you go back and collect data about a bunch of projects you’ll find that FP and LOC are highly correlated. Because none of the historical projects were incented to produce unnecessary lines of code, the relationship holds up nicely when collecting baseline data on both elements.

The problem isn’t the baseline. It’s the next experiment we conduct that’s an issue. We go ahead with a FP counting system for our new projects. And then, we come back and count LOC as well and find them still to be highly correlated. And for some reason we’re upset! Why’d we spend all the time and money on a FP counting system if LOC would work just fine!?!? What we should be doing is running two totally different pilots. One where we measure and incent upon FP and one where we measure and incent upon LOC. Then, after a while, go back to population 1 and collect the LOC data where we incented behaviors on FP and go back to population 2 and collect the FP data where we incented behavior on LOC.

I bet you’d find that the correlation in population 2 would be broken while the correlation in population 1 would hold. Why? Because once developers are incented to do the wrong thing, LOC is no longer a meaningful measure. Great in theory, terrible in practice. In this space it is possible for the code to lose relationship to the size of the problem because we encourage the wrong behavior. LOC don’t have to be related to FP, but if we don’t muck with the system, they will be.

On the other hand FP are independent of the developer’s control. Therefore, they find no reason in such a measurement environment to artificially increase the lines of code. Writing extra code gains them no benefit and so they don’t do it.

The important issue here isn’t that LOC can’t measure the same thing FP can. It can and likely does. The important issue is that FP eliminates the incentive system that developers get when you measure on LOC. The ability to independently assess how much is being built is the important part. The critical issue is the difference between what works well in theory and what works well in practice.

Advertisement
Posted in: Measurement