Planning with Uncertainty
One of the secrets to good estimation is to identify what you don't know, and make that part of your planning.
To the uninitiated, a software development project seems pretty straight-forward. It may seem like an easy matter to, say, determine the tallest person in a room from a photo. In fact, even a young child can probably look at the photo and find the tallest person. If it’s that simple, how hard could it be to have a computer perform the same feat?
Very hard, it turns out. Computers are insidiously literal beasts, and are terrible at many things humans take for granted, like inference and understanding visual data. Trendy technologies like AI and image recognition can only simulate those abilities, through a truly terrifying amount of applied mathematics.
Yet many stakeholders want to know when they can expect for the software to be ready, and how much it will cost to build it. Perhaps this is because, from the outside, software development doesn’t look that different from building bridges. Just find the right pieces, bolt them together, and everything should work, right?
Unfortunately, there are three things that are never predictable in software development:
Problem complexity,
Delivery timelines, and
The viability of third-party code.
I have seen it time and again as a principal software engineer. The things that look like they should be easy are often the hardest of all. Up to this point, there have been exactly two ways of dealing with this in software estimation: (1) don’t commit to estimates, or (2) make your best guess, multiply by three, and pray nothing goes wrong.
Quantified Tasks creates a third option: plan with uncertainty! Imagine if, instead of coming up with a roadmap and stamping it with “here be dragons,” you could identify where those dragons might be hiding, and actually factor in the time to fight them?
Planning an Unlikely Journey
To understand how uncertainty works in planning, let’s consider a fictional journey. You are a powerful wizard tasked with guiding twelve dwarves and a hobbit from Hobbiton to Erebor.
Distance: As The Crow Flies
The first information you need is a roadmap. Consulting your atlas of Middle Earth, you find that the journey is approximately 1000 miles along the shortest route. If there are no delays or obstacles, the trip will take approximately 4 months.
In Quantified Tasks, this relates to Distance — approximately how long a task will take if the developer knows everything. Early in the planning process, you’ll usually have a list of Stories that need to make it into the next Release (Gravity scores of 4 and 5.) Your developers will be able to determine Distance scores for each of these Stories, and the sum total of the Distance for each will give you a good picture of how long the whole Release will take to complete if everyone knew everything.
However, as with unexpected journeys across Middle Earth, nothing ever goes quite according to plan.
Friction: Rough Road Ahead
It’s a simple principle: you can move faster along flat, well-worn road than you can over steep, rocky terrain. The higher the friction, the longer it will take to make the journey. In the case of our dwarves, they have good road for the first half of the trip, but then they’re going to have to cross the Misty Mountains. Later, there’s that whole leg of the journey through Mirkwood, which will be slow due to the darkness.
In our favor, however, we can find help at Rivendell, at the house of Beorn, and at Lake-Town of Esgaroth.
In Quantified Tasks, this relates to Friction — how many known obstacles exist to completing the task. Poor documentation, technical debt, unclear requirements, and even lack of precedent can increase Friction. Good documentation, available subject-matter experts, healthy code, and good precedent to study can all decrease Friction.
Still, the trouble areas we consider here are known obstacles. We can adjust our estimates with reasonable accuracy for increased Friction.
Now what about what we don’t know? This is where things get interesting.
Relativity: Unknown Factors
Anything we don’t know contributes to Relativity, which is the Quantified Task measure for the amount of uncertainty in a task. It is best thought of as a ratio of known factors to unknown:
r1: Virtually no unknowns.
r2: More known than unknown.
r3: We know about as much as we don’t.
r4: More unknown than known.
r5: Virtually no knowns.
Even so, assessing Relativity may seem tricky: how can we measure what we don’t know? Let’s consider our map of Middle Earth. There are several places where we have varying levels of uncertainty:
The Trollshaws have trolls. We know how they act, but we don’t know where they are. (r2)
Both the north and south passes through the Misty Mountains have goblins, but we don’t know where they are or what they do. (r3)
The Gladden Fields have been reporting a lot of Warg activity. We don’t know where they are or what they do. (r3)
The north end of Mirkwood is against the Gray Mountains, which is heavily occupied by orcs, hobgoblins, and goblins. We have no idea what safe routes are there, if any. (r4)
The Elf-path through Mirkwood is dark, narrow, and intersecting the Enchanted River that puts people into deep sleep. At least we know the whole path is passable, but we don’t know what’s off the path. (r3)
The Old Forest Road through Mirkwood is overrun by goblins, and has fallen into severe disrepair. (r4)
The Necromancer is active in the land to the south, near Dol Gildur. Anything could happen, and none of it good. (r5)
Although we don’t have any way of knowing what will happen through any of these areas of Relativity, we know how many of these areas we have to contend with:
We must pass through the Trollshaws to get to Rivendell, an important stop-over.
We must go over the Misty Mountains.
We must go around or through Mirkwood.
The Trollshaws give us 2 Relativity points, no matter what. Next, there are two ways over the Misty Mountains (r3), but the South route means crossing the Gladden Fields too, which adds 3 more Relativity points, so we’ll take the North route.
Next, we must consider how to deal with Mirkwood. The lowest Relativity option is the Elf-path, with a Relativity of 3.
We just chose the safest route using two distinct tactics: (a) minimizing how many tasks contribute Relativity, and (b) favoring options with lower Relativity.
What Is Flux?
Any unknown factor is known as Flux. Each section on our fictional journey map that contains an unknown factor is what I like to refer to as a Flux-Bridge — a hypothetical bridge of unknown length. You have no way of knowing how long a Flux-Bridge is until you cross it; it could be six inches, six miles, or six light-years.
The trick, then, is not to try and guess how long the Flux-Bridges are, which is the shortcoming of most estimation techniques. Instead, you focus on counting how many Flux-Bridges exist on your chosen route.
It’s important to remember that while not every Flux-Bridge will be light-years long, any Flux-Bridge could be. The point is to improve your odds at this game of chance called estimation by minimizing how many Flux-Bridges you have to cross.
The one certainty about a Flux-Bridge is that the route back is almost always predictable. If you discover that crossing is infeasible (r5), you can backtrack and pick a different route.
Borrowing from a different Tolkien story, Gandalf originally meant the Fellowship of the Ring to cross the Misty Mountains at Caradhras. In terms of Quantified Tasks, that option had a lower Relativity, as the route was known and evil creatures were unlikely, leaving only the magical weather as an unknown (r2).
Going through Moria had a much higher Relativity (r4), since they didn’t know the entrance, the darkness would make navigating hard, there were evil creatures abouding, and there was no guarantee that any given route was passable.
Planning aside, once it was clear that the magical weather on Caradhras was going to make that route impossible, Moria — the higher Relativity path — became the better option. The moral here is, despite your best efforts to minimize Flux-Bridges, reality may still force you to contend with the higher-Flux path. Such is the nature of estimation.
Relativity in Release Planning
When selecting Epics and Stories for a Release, we can factor in their Relativity scores in the same two ways:
Limit how many high-Relativity Epics and Stories we plan for a Release.
Choose options which minimize Relativity.
Impact and Gravity, the two Quantified Task measures which contribute to Release planning, must be taken into account. These are defined by the product owner and stakeholders (e.g. the client or users). Relativity becomes a key communication tool by which the Developers can influence Release planning.
For example, consider two Gravity 5 Epics: Single Sign-On (SSO) support for an existing user authentication process, and email reminders for unread messages. The Developers look at these two Epics, and determine the following unknown factors:
SSO Support: Total Relativity 6
We need to authenticate with a predetermined SSO provider, which we’ve done in six other product at our company. [Relativity 1]
We need a workflow to associate existing user with SSO. This must be designed, as we know very little about what this workflow will look like. [Relativity 4]
We need a new button on the login screen for using SSO. [Relativity 1]
Email Reminders: Total Relativity 11
We need to find a solution for sending emails, either pre-packaged or custom-built. We only know what we need it to do, not what options are viable for our product. [Relativity 3]
Some users have email addresses, but we don’t know if they’re valid. We also need users without email addresses to add them to their profiles. This requires us to create a new workflow requiring adding and verifying emails. We have a workflow for accepting the Terms of Use, which we may be able to mimic in part. [Relativity 3]
At present, we do not have event scheduling, instead taking action (e.g. updating notification icon) when a page is loaded. We will need an event scheduler added, and we don’t know what this will look like. [Relativity 4]
Different users will want emails about different messages, so we’ll need to expand the settings with some new fields. We’ve done this before. [Relativity 1]
Just from that initial exploration, we know that SSO Support has five less Relativity points. However, notice that this doesn’t necessarily mean the SSO Support feature will take less time to implement, only that the Email Reminders feature has more uncertainty up front.
Imagining we also have a few bugs that must be resolved in this Release, we may decide that the SSO Support Epic is the feature we develop this time, as it minimizes how much uncertainty we take on in a single development cycle.
Lowering Relativity with Agile Spikes
If our team has a bit of available energy up front, and we know we want to complete both Epics at some point, we may want to lower the Relativity on both through some requirement gathering, design work, and research. In Agile, a Spike is a timeboxed research- or experimentation-oriented task.
In Quantified Tasks, a Spike can sometimes be used to lower the Relativity of a Story or Epic up front.
Design the workflow for associating an existing user with SSO.
Evaluate and compare email-sending solutions, with a given budget for an external solution, if desired.
Design the workflow for validating email addresses.
Determine requirements for event scheduling.
After these spikes, we have new information.
The SSO association workflow is completed, but additional uncertainty was determined surrounding forcing explicit SSO reauthentication, to prevent mistakes from associating the account with the wrong SSO. [Relativity 3]
Of the four email-sending solutions proposed, the team decides to proceed with one of the external solutions; it is within budget, meets needs, and has good documentation and precedence in the product’s tech stack. [Relativity 1]
The email address validation workflow is completed, and assuming the ability to send emails, is fairly deterministic to build. [Relativity 1]
The product’s tech stack has a built-in event scheduling system that was previously overlooked. [Relativity 2]
Updating our Epics, we now see that the SSO Support Epic still has a Total Relativity of 6, but the Email Reminders Epic now has a Total Relativity of 5, meaning it now has less uncertainty than the first.
If the organization is wanting to push an update to the product as soon as possible, the Epic with less Relativity may be a better option. The other Epic can be deferred to a subsequent Release, in an effort to keep Relativity to a minimum, and the rest of the developer energy for this Release can be dedicated to fixing some low-Relativity bug fixes and paying down a bit of technical debt.
Relativity, then, allows a team to control for how much uncertainty — and thus, risk — they take on in any given development cycle.
Relativity or Friction?
Remember: Friction is related to our known obstacles, while Relativity is related to unknown factors. You only want to count each factor once in your estimates, and since Friction is linear, and Relativity is exponential, it matters which measure you use to capture a factor! However, multiple factors can also seem to blur together.
For example, some teams struggle to identify whether poor documentation contributes to Friction or Relativity. The answer really depends on why the documentation matters. If the task being estimated on involves a known design pattern, but that pattern is undocumented in the tech stack, this would be a high-Friction, low-Relativity task — lack of knowledge is the only complicating factor.
On the other hand, if a viable design pattern is itself unknown, that contributes Relativity, since the knowledge of how to use different design patterns in the tech stack does not resolve the uncertainty.
A quick way to differentiate between Friction and Relativity is to ask three questions:
Can we describe one or more solutions in abstract terms?
No Solutions: Relativity 4
Multiple Solutions: Relativity 3
One Solution: Relativity 2
How confident are we that at least one of the solutions will be viable?
Low Confidence: +1 Relativity
Moderate Confidence: Same Relativity
High Confidence: -1 Relativity
How many of these solutions are well-known and well-documented in our tech stack?
None: Friction 4-5
Some: Friction 2-4
All: Friction 1-2
The actual Friction score for the task may change when you narrow the field of possible solutions.
Accepting Flux: The Nature of Estimation
As Robert Burns aptly reminds, “The best laid schemes of Mice and Men, oft go astray.” No matter how much we plan, uncertainty still can bite us in the rump. Estimation is still just an educated guess.
Although bridge building scarecely provides a good analogy for software engineering, perhaps another brand of the construction trade is better considered: home renovation. Even best contractor can draw up plans, schedules, and detailed cost estimates, and still have her entire forecast dashed when she opens a wall and finds termines have chewed through the support beams.
This is why, in Energy Points estimation, we take the sum of Distance and Friction and multiply it by Relativity. By exponentially scaling up our estimations, we plan for all Flux to take additional developer energy. Then, if at least some Flux-Bridges turn out to be shorter than anticipated, we can hopefully adjust our budget of developer energy to handle the Flux-Bridges that are taking longer than anticipated. In practice, this helps minimize how far over our estimated delivery dates we actually run.
Even so, sometimes you have to take a step back and rethink. That’s where Agile is so important! Teams must be prepared to adapt their plans to new information. Quantified Tasks is what buys the team that time to adapt, and helps them communicate the causes of delays to stakeholders in a way they can make decisions from.
Often, identifying uncertainty makes all the difference.