Measuring Developer Productivity...for Real!
Quantified Tasks is the first objective technique for measuring tasks, so much so that it even allows you to measure developer accomplishment.
Over the years, Managers have faced the challenge of measuring Developer productivity. There have been many failed attempts at solving this:
Counting lines of code written encourages writing complicated code, and doesn't reward Developers for improving efficiency and code health, which will typically result in negative line counts.
Counting shipped features discourages Developers from paying down technical debt and maintaining legacy systems, as such work becomes invisible and thankless.
Counting fixed bugs encourages creating bugs. (As Wally in “Dilbert” said, “I’m going to write a new minivan!”)
Counting hours worked punishes efficiency and encourages overwork and poor time management.
Counting closed tasks makes no differentiation between easy tasks and hard tasks. A Developer who closes a dozen easy frontend fixes looks better in this light than the one Developer that developed an innovative AI model to respond to search requests.
Counting story points never made sense, because who can say what one story point is even worth?
There have even been some recent attempts at solving this which have proven naïve and ill-informed, as Kent Beck and Gergely Orosz recently pointed out:
The question remains, how do we measure Developer productivity and accomplishment?
Quantified Tasks provides the first reliable, objective method of estimating and measuring how much Developer energy is needed to complete a task. (Read more.) Meanwhile, Impact and Gravity help align priorities to stakeholder needs. All of these provide insight into Developer effort and outcomes.
Because these scores are defined on tasks from sprint to sprint, they provide insight much earlier than periodic employee reviews or surveys. Better still, if a team is properly using Quantified Tasks, their estimates serve as leading indicators of project and team health. Not only can these provide valuable insights into Developer performance, but they also serve as a communication tool between Management and Developers.
Although this data is only part of a larger picture, Quantified Tasks represents a leap forward in measuring development work.
A Few Warnings
While Quantified Tasks provides a more accurate and nuanced system of measuring accomplishment, caution on the part of the Manager is warranted on a few fronts!
Implementing Quantified Tasks
Quantified Tasks is primarily a tool for estimation and planning, and it must be embraced by the team before it will provide useful productivity insights. You can verify that teams are using the methodology correctly by ensuring that estimates are reasonably accurate (remembering that estimates are still estimates), and that sprints and releases are being completed in alignment with the Planning measures (Impact, Gravity, Priority) and the actual business and stakeholder needs.
For example, if a conscientious Developer spends a sprint mitigating an unforeseen security issue that could have cost the company millions if unresolved, but that security issue was never properly identified as having a high Impact, it may appear that the Developer’s priorities were out of line.
Similarly, if a team originally estimates too high, but their estimates become more realistic as they become familiar with Quantified Tasks, their Velocity will appear to come down. This could be mistaken for a decrease in effectiveness until one realizes that they’re learning the methodology.
Respecting Goodhart’s Law
"When a measure becomes a target, it ceases to be a good measure.” -Goodhart’s Law
If Managers disregard Goodhart’s Law and define targets for Quantified Task measures (especially Energy Points and Velocity), there is a very real temptation for teams to artificially inflating scores.
Instead, utilize the totals and averages of these measures as a starting place for assessing performance. For example, rather than requiring that a Developer complete a minimum of 40 Energy Points per sprint, thereby destroying the usability of that as an estimation tool, a good Manager will instead take notice that a Developer is trending higher or lower from their average, and investigate why.
Every Developer needs to be able to form a personal relationship to the Estimation measures, especially Energy Points. This means that direct comparison between Developers are seldom appropriate, and never for evaluating performance. A Developer must only ever need to “compete” against themselves, and external factors like life circumstances and familiarity with the tech stack must be taken into account.
In short, both Team and Personal Velocity must only be part of a larger picture.
Measures as Performance Metrics
Most of the measures in Quantified Tasks can be used as raw productivity metrics. Again, by “raw,” I mean that the metrics provide clues for where further inquiry is needed.
Impact
The total Impact of tasks completed by a Developer indicates how much user value they've provided, based on the impact analysis performed by the Product Owner and Stakeholders.
The average Impact indicates the alignment of the Developer’s work to Stakeholder goals. Remember that Developers are only supposed to work on tasks which have been identified as part of the Release (Gravity) and Sprint, so if there is poor alignment, Managers must confirm whether Release and Sprint planning were properly aligned to Impact.
Gravity
The total Gravity of tasks completed by a Developer indicates how much value they contributed to the Release.
The average Gravity indicates how much alignment there was between the Developer’s work and the goals of the Release.
Remember that Priority and Gravity are necessarily distinct concepts. For example, several g3 tasks may have been needed before a g5 task could practically be implemented.
Energy Points
Energy Points indicate how much work was completed. Every Developer can complete a different number of Energy Points depending on their seniority, skill level, and familiar with the project and technology. The number of Energy Points a Developer completes in a sprint is their Personal Velocity.
For this to work, all effort on a project MUST contribute to Velocity. This means all design, testing, documentation, code review, and deployment work (etc.) should be tracked as tasks with accurate assignments and estimates. It’s common for such “non-implementation” tasks to be left off the issue tracker, which creates the illusion that less work is being done. For example, a Senior Developer may spend eight hours a week code reviewing colleagues — an important and productive part of any development team’s workflow — but that work is seldom included in the issue tracker. Thus, the Senior Developer’s Personal Velocity, and the Team Velocity along with it, will appear to be lower than it actually is.
When Quantified Tasks is properly implemented, Personal Velocity can reveal a Developer's increasing skill over time, but only if compared against their own past performance, with external factors considered. Every Developer is unique, so comparison between co-workers, even at the same experience level, is strongly discouraged.
The one exception to this is in monitoring the average Personal Velocity of everyone on the team, and comparing that to the Personal Velocity of a single Developer. When a Developer’s Personal Velocity is consistently above the average, this can be a strong indicator that the they are overworked, and may be in danger of burnout.
Similarly, if their Personal Velocity dips well below either their own average or that of the team, this may indicate that external tasks or other factors may be diverting their attention from the project. Again, there may be valid causes for this — onboarding new team members, participating in meetings, or dealing with external factors. Everyone’s personal best will also change from week to week, depending on life circumstances. However, a severe or sustained dip may warrant further curiosity.
Distance
The total Distance of tasks completed by a Developer is a raw measurement of effort, apart from research and experimentation. Remember, Distance measures how much effort is involved if the Developer knew everything about the task and the technology involved, so this total will not include effort for research, experimentation, or learning.
Friction
The average Friction of tasks completed by a Developer indicates the complexity of the work they completed. Higher Friction indicates that a task objectively involves more research and experimentation. A high average Friction would mean the Developer probably untangled some difficult problems.
When a Developer demonstrates an upward trend in average Friction from sprint to sprint over time, it may be a good indicator they are ready to move on to a higher seniority level. When monitoring this, be sure to factor in the total and average Friction of each sprint.
Relativity
The total Relativity of tasks completed by a Developer indicates how much uncertainty, or "flux," their work involved. Higher flux makes for more challenging work. As with Friction, an upward trend here may be a good indicator of increasing experience.
Be advised, however, an increase in Relativity across a project is a likely indicator that business analysis and software engineering activities have seen a drop in effectiveness. If this trend goes unaddressed, your team may be headed for burnout, and the project is likely to stall.
Using the Metrics
As Kent and Gergely pointed out in their article, it is not enough to simply measure a Developer’s work output. Even Quantified Task metrics are only part of a larger picture.
To recap:
Quantified Tasks must be properly implemented, continuously validated, and functioning as a reliable estimation and communication tool, aside from measuring productivity. If it isn’t useful for planning, it is worthless for insight.
All work on the project must be tracked and estimated, to ensure it contributes to Velocity. Otherwise, Velocity is an incomplete and inaccurate picture.
A good Manager will consider all metrics both on a personal and team level.
Impact alignment is just as important as Velocity.
The alignment between Impact, Gravity, and validated user value should be routinely verified.
Quantified Tasks provides some promising tools for measuring Developer productivity and accomplishment. It is still only a part of a larger picture, so I recommend reading the following as well:
Getting Started with Quantified Tasks
If you want to get started using Quantified Tasks with your team or company, check out the Overview.
Please also subscribe to get the latest articles and updates delivered right to your inbox!