What’s the deal with story points and agile estimation techniques, and why can’t we just improve our estimates over time?  That’s part of the question that a friend of mine asked recently, and I’m going to respond to it in this blog post.

My friend is relatively new to agile and so he has been asking lots of excellent and probing questions.  Here is an email he sent me:

Agile practitioners estimate work in terms of points and then after each sprint determine their velocity so that they can make better estimates about how much work remains in terms of real time.  Is there a better way to do this? I propose that we estimate the amount of work remaining in real man-hours.  Then, after each sprint, we estimate our reality_factor as (reality_factor = estimated_hours / actual_hours_taken).  The goal being that the reality_factor should converge to 1 as we get better at estimating time that it takes to do things.
Benefits:
  • If anyone asks how much work is left, you don’t have to do a calculation using the velocity to get to real man-hours.
  • As you converge, the meaning of real man-hours is universal.  With points, you have to re-converge to the meaning of 1 point whenever you start a new project or work with a new group.
Drawbacks:
  • You have to keep careful track of time.
What do you think?
He asks a very reasonable question, and he comes up with a very reasonable solution.  Unfortunately, we’ve been down this road before in software development, and that’s why the idea of story points exists.
I don’t believe the reality factor will ever converge very close to 1, because of the inherent human biases in estimation.  And keeping careful track of time is a large drawback, since it is generally demotivating and encourages people to underreport the amount of work they are doing so that their initial estimates look more accurate.  That is a self-destructive cycle but one that developers do anyways.

 

Man-hours is also a deceptively simple metric, because it implies that adding more people (ie, more man-hours) to a project will speed it up. And so managers are often tempted to add more people to a project late in the process in order to speed up delivery, but this rarely has  the positive effect desired.  In fact, it may slow the team down further.

So keeping velocity separate from a metric like man-hours is an intentional obfuscation.  It also addresses a very important issue in estimation.  Generally speaking, we are not very good at estimating things in absolutes.  When we try to estimate in absolutes (ie, a concrete number like 2 days), we tend to give our most optimistic estimate, and often it’s a gut reaction.  There are so many uncertainties in software development that it’s very hard to estimate in absolute terms.

Fortunately, most people are much better at estimating in relative terms.  We still won’t be perfect, but it’s much easier to say that “Task B is twice as hard as Task A”.  This sort of relative estimate is much more likely to be accurate than saying “Task B will take 10 hours and Task A will take 5 hours.”

If we plan based on these relative estimates, we are much more likely to be accurate in our sizing of how hard a task is.  And if we stay consistent in that relative sizing across estimates, then we can look at our historical velocity.  Historical velocity is an absolute number, but’s that okay.  We can say definitively that the team accomplished 25 points of work in the last iteration.  And so that is a number grounded in reality, and so it provides a frame of reference for the relative estimates of story points.

So, would a “reality factor” help in estimation?  I argue that it would not, because the formula that my friend proposes above is still based on an absolute metric of hours that we simply aren’t good at estimating in.  Historical velocity serves the same purpose that my friend is trying to account for with a reality factor, and yet it still lets us stay in the world of relative estimation, which is a world we are much more likely to succeed in.

Have you ever been asked for a rough estimate, and then later regretted giving a “rough” estimate because the customer tried to hold you to it?

A familiar story

This week I spoke with an agile coaching colleague of mine, and he relayed a story that rang all too true.   The team is working on a major project with a fixed date delivery requirement.  The work must be done in six months, and the team already has a well prepared backlog of user stories to work on.

My friend prepared a chart showing projected velocity up until that delivery date, and when he gave it to the customer he qualified it by saying “these are just projections, and they are really based on gut estimates.”  He made it clear that the projections were not a guarantee, they just helped the team envision if it was possible to get everything done in time.

Can you imagine how this story ends up?  You probably can because I bet you’ve been down this road before (I certainly have).

Rough estimates never stay “rough” very long

Later in the project, the team was starting to catch flack from the customer because their actual velocities did not match the projections from very early in the project.  The customer basically said “but you committed to this, I have the chart right here!”  My friend the agile coach had to remind them that in agile, the commitments are based on individual iterations, and he had not presented that chart of projected velocities as a commitment.

It reminds me of one of my favorite quotes from Steve McConnell, author of Software Estimation:

A single point estimate is usually a target masquerading as an estimate.

My friend was not trying to give a commitment, or even a target.  He was giving a rough estimate, and he communicated that when he gave the estimate.

The problem is, his estimate didn’t match up to the disclaimer he was giving.

Put it in writing

When you give a single point estimate, the customer implicitly understands that to mean you will hit that estimate with 100% certainty, even if you give a verbal disclaimer.  When they look back at that estimate later on, they won’t remember your verbal disclaimer or the agile training you gave them.

All the customer will see is a single line showing projected velocity, and then they will compare that to the actual velocity and not understand the discrepancy.

So how do you prevent estimate communication problems?

Here are five quick tips:

  1. Never show projected velocity (or any estimate) as a single point.
  2. When showing a projected velocity, give a low and high range, or a 3 point estimate (low, high, most likely).  Show the whole range in a chart, so that the customer can visually see that there is a range of possibilities.
  3. When you track your velocity estimates as the project goes on, plot them against that range, and show that you are within the range of possibilities you set out.  (If you’re not truly in that range, then you need to reset expectations or make a course correction).
  4. Never give only a verbal disclaimer about your estimates.  Make sure that the chart or timeline itself represents the uncertainty.
  5. Understand what is most important to your customer.  If the fixed delivery date is most important, then show a range of features or velocities that can be accomplished in that timeframe.  If the feature set is most important, then show a range of completion dates for that feature set.

About a month ago I was working with a team planning another iteration for the project, when we had an interesting discussion about how to manage our burndown.

On all of our previous sprints, our burndowns were based on hours.  That is, for each day of the sprint, team members would revise their estimates of the time left for any tasks they were working on.  Typically (though not always), this means they were revising their estimates downward, and so the curve trends downward over the course of the sprint.  At the end of the sprint, it ideally reaches zero on the last day as all the tasks are completed.

Example burndown using range estimates

On the projects for that particular company we have traditionally always burned down by hours.  What I like about that is you get a real sense of it things are going astray.  Because you often end up revising estimates upward early in a sprint, it’s a way to show that as you started to dive into tasks, things turned out to be more complicated than you expected.  If that’s only happening on a few tasks, it’s not a big deal and while your burndown may initially be a little flat, the iteration is still probably not in jeopardy and you can usually recover.

But if the burndown takes a significant jump up, perhaps because lots of tasks are taking longer than expected, then you know you have a serious problem that needs to be addressed.  Burning down hours makes it easier to a team to raise this red flag, in a way that burning points down does not.

An example of a burndown that is going very poorly and the estimates are staying above the high bounds

The other thing that I like about burning down by hours is that it provides an easy way for the team to estimate in ranges.  As I’ve indicated in many other posts and in talks I’ve done at conferences, I love estimating in ranges (ie, 4-8 hours), instead of in single points (ie, 5 hours).  Range estimation allows me to more accurately communicate the risk and uncertainty of a task.  A wide range indicates larger risk and uncertainty than a narrow range.

When I make my initial estimates for an iteration in ranges, then I can put some nice boundaries around my burndown, as I’ve also discussed in other posts.

However, there is also a good case to be made for burning down in points.  In this recent project, one team member had joined from another company and he made the case for burning down in points.  His preference was based in part because burning down by hours can always be “gamed”, ie, you can remove tasks from the iteration or just give misleading estimates in order to make the burndown look better.  Furthermore, burning down by story points gives a clear indication of what is truly done, and at what point in an iteration the story is complete.  By doing so, you keep the team more focused on what is truly done, and less focused on the way you got there (ie, the tasks estimates).  No less than Jeff Sutherland, co-creator of Scrum has stated that “The best teams I work with burn down story points.  They only burn down when a story is done.”

The rest of the team expressed some hesitation about going this route.  In part because we liked the use of ranges and felt that it forced us to be more conservative in the stories and tasks we take on in a sprint.  And in part because we felt the burndown would communicate less because most stories would not end up being done until the very end of the iteration.

However, our fears that no stories would get marked done until near the end of the sprint may have been symptomatic of another problem:  our stories were simply too big.  Jeff Sutherland notes in his blog post that “to do this, the team needs to have small stories.” If the story is not small enough for each developer to have multiple stories during the sprint, then burning down by points may not work well.  I’ve heard Mike Cohn say that on average each developer should have 2-3 stories in an iteration, and I would say that is the minimum needed to make this approach work.  I wonder if a slightly smaller breakdown so that everyone has 4-5 stories would probably work even better.

Both approaches have a lot of merit to them however, and so why not do both?  I realized this was an option when I attended an Agile Richmond talk recently, given by Guy Beaver, co-author of “Lean Agile Software Development”

Guy described using both charts on the same projects, in what he called an “Agile V” burndown configuration.  Basically you put a burndown of hours on the left, and a burnup of story points on the right.  Ideally as your hours burndown goes to zero and your points burnup goes to 100% complete, you get a rough V shape at the end.   Some teams have even superimposed the two charts on one plot (using both an hours and points scale on the y-axis), to get an “Agile X” shape in their burndowns/burnups.

gb0307-1
Image from this post by Guy Beaver on Agile Journal

You can see a post from Guy on this topic here, where he describes how this approach results in an “Agile-V Scorecard.”  I particularly like how he describes that this approach has benefits for a new team to help them keep focus on delivering value.  He also describes how for a mature team, this approach allows them to realize early in an iteration when enough stories aren’t being delivered (you should be able to knock out a couple early in the sprint), and therefore some obstacles must exist that need to be removed.

I like this idea, since it combines the best of both worlds.  As long as you are not adding a lot of overhead in asking developers to both re-estimate their hours remaining on a daily basis, as well as track when a story is “done”, then it seems like a good path to try.  If you combine it with my range estimation techniques on the burndown, then you are getting the benefit of the conservative estimation techniques, as well as the “focus on completion” of the burnup.  Those are mutually beneficial strategies to follow.

There are even additional benefits that such a combined approach may encourage.  As one commenter on this forum post notes, “with this combination you can detect multi-tasking.” If the burndown of hours is progressing fine, but no completed stories are reflected on the story point burnup, then the team may be multitasking between stories too much.  So having both charts will encourage the team to follow good lean and kanban practices of limiting the work in progress, and just getting things done.

In this particular project I mentioned at the beginning, after some discussion we ultimately decided to stick with the hours burndown rather than switch to the points burndown.  Only one team member wanted the switch, and we were reluctant to change directions mid-project.  But it was a good discussion and it got me thinking about how to address both sets of valid concerns.

After hearing Guy’s talk and doing more research, I think the next time I’m in that situation I will not allow it to be a choice of one or the other.  Doing both may offer significant benefits and I’m curious to try it out.