How Refactoring Speeds Development by 43%

What if you could accelerate software development by 43%? This isn’t yet another AI-driven pipe dream but a tangible outcome rooted in something fundamental: code quality.

Code quality is an ill-defined and technical concept that is alien to most business managers. And even within a development team it might be hard to agree on what’s good enough(™).

In this article, we turn code quality from a vague, subjective concept into a measurable, actionable metric that can boost your team’s productivity and bottom line. And these aren’t empty promises: we’ll use a validated statistical model that translates Code Health scores into tangible business value – faster development, fewer defects. Let’s start by aligning on what a “good” codebase looks like.

Benchmarking: how healthy is the average codebase?

Recently, we published the first-ever benchmarking report of Code Health across various industry segments. We measured Code Health using the, well, CodeHealthTM metric. It’s a strong metric since it’s validated via peer-reviewed research and outperforms other metrics (Yes, really – check out this research paper). Unlike other metrics, Code Health doesn’t stop at code-level findings – it connects directly to business outcomes like development speed and defect reduction.

This article won’t go into the details of the metric. (There’s a good overview of how it works here). But basically, Code Health is a composite score that evaluates maintainability based on code smells—patterns that signal design flaws or accidental complexity in source code.. The resulting scores go from 10.0 (healthy code that’s easy to maintain) down to 1.0 (overly complicated code which is difficult to understand for developers). So, how do we do as an industry?

As we see in the preceding table:

Across industries, the average Code Health is below the healthy threshold. (The Code Red paper defines healthy code as 9.0 or higher).
The code worked on the most – the hotspots – is in worse shape than the rest of the codebase.

This means that code quality, in general, is lower than desired, indicating that the average codebase is more expensive to maintain than it has to be. We’ll soon look at what this means in practice, let’s just clarify why the second measure – Hotspot Code Health – is so important.

Hotspot Code Health: Focus on high-interest technical debt

Figure 1: Example on Hotspots in a codebase. Visualized via CodeScene.

You see, not all code is equally important. As shown in the preceding visualization, development activity tends to cluster in a small part of the codebase. Hotspots are a behavioral code analysis technique which identifies those critical parts based on data from version-control. In practice, this means that any Code Health issues in a hotspot tend to become major productivity bottlenecks – it’s high-interest technical debt. (See this blog on prioritizing technical debt for more details on hotspots).

As we noted in the comments on the benchmarking table above, the critical Hotspot Code Health is significantly lower than each sector’s average. That sounds wasteful. We can prove that by pulling in another piece of recent research and connecting the score to developer productivity.

Connecting code quality to developer productivity

Earlier this year, we published an award-winning statistical model for translating Code Health scores into business value (faster, better). Let’s use that model to illustrate the impact of the lower Hotspot Code Health.

For this example, we’ll zoom in on the Industrial and Technological sector. This is a broad sector, including companies where software is their primary product as well as domains like telecom and IT services. The benchmarking data reveals that this sector has an average Hotspot Code Health of 5.15. Here’s what 5.15 looks like when plotted on the value curve from the statistical model mentioned above:

Code-Health-Score-surve-affecting-development-time-

Figure 2: The statistical model let us translate increases – or decreases – in Code Health to business value creation in terms of speed/velocity and defect reduction.

In the preceding visualization, we plotted the three Code Health values: Hotspots, Average, and the Hotspot Code Health for the top 5% performers in the industry. These Code Health scores are shown on the X-axis, and Value Creation on the Y-axis.

Now, admittedly “Value Creation” sounds vague. “Value Creation” is simply a composite metric reflecting how much faster – measured in hours – a development task becomes in healthy code, combined with a measure of the defect reduction (fewer bugs in healthy code).That makes it a valuable metric for comparison.

So with that covered, we’re ready to contrast the average organization with the top performers. Their difference in Hotspot Code Health – 5.15 for the average and 9.1 for the top 5% – translates into a whopping 27-43% improvement in development speed and between 32-50% reduction in post-release defects!

These numbers are staggering if we translate them into the harder currencies of time or dollars. Imagine a mid-sized software firm with 100 developers working on a SaaS platform. By improving Code Health to match industry leaders, they could:

Get 77,000 additional productive hours per year. (43 devs times 1800 hours).
Translate the gain into savings, as hiring those devs for real would cost around 5 million €.

Obviously, these extra developers don’t exist in the real world. Rather, those “extra” developers represent the time not wasted in trying to work with poor-quality code. Or, put another way, the average organization might pay for 100 devs, but only get the equivalent output of 57-77 developers. (A poor deal if you ask me, and an even worse deal when

considering Brooks’s Law).

Are these realistic quality standards?

We’ll soon talk about how you can leverage these numbers yourself. But before we move on, let me clarify that we’re not talking about overly polished code or impossible standards: the top 5% performers aren’t delivering 10.0 code health. A Code Health score of 9.1 indicates that even they live with some code smells, although they are careful not to let them get out of control.

Development-activity-hotspots-curve

Figure 3: Hotspots only make up a small portion of the overall codebase, but attract a disproportionate amount of development work. Hotspots are calculated by mining Git history.

Return on Investment: Create tangible goals for code quality

The first step towards any improvement is to get situational awareness: what is your current Code Health? Depending on the outcome, you might choose different goals:

Above average: aim to become a top performer by elevating your Hotspot Code Health.
Below average: set the goal of improving your Hotspot Code Health to the industry average.

However, the benchmarking data indicates that there is a low-hanging productivity fruit ripe for most organizations; remember that the typical organization had a lower Hotspot Code Health than Average Code Health?

What if the hotspots were improved to the same level as the rest of the code?.

It’s a low-hanging fruit since it’s the team’s own code, which is used as a target. It’s the same people who wrote the code in the hotspots as the ones who wrote the rest of the code. This means the skillset is there. It just needs to be applied to the high-impact hotspots, too.

Here’s an example of how such a business case could look:

Code-Health-Score-affecting-development-time Figure 4: An example of how benchmarking Code Health lets you build a business case for code quality improvements and technical debt remediations.

The preceding visualization illustrates three scenarios for a hypothetical codebase with a Hotspot Code Health of 6.8. (This number represents problematic code with technical debt).

Ambitious Goal: The organization's Code Health of 6.8 is above the industry average of 5.15. Hence, a possible goal would be to climb into the top performs category, requiring a Code Health of 9.1
Realistic Improvement: A smaller step is to just elevate the Hotspots to the same Code Health level as the rest of the codebase. In this hypothetical codebase, the average health of all code is 8.3. Note that even this very tangible goal promises ~ 20% faster development in the hotspots.
Defensive Prevention: In this scenario, you merely put a quality bar on what’s already there. It’s a minimum effort: don’t let your code get worse. Interestingly, there’s a big gain if you can prevent a decline to the average, and the value add of preventing the decline is quantifiable using the same statistical model as illustrated in the figure above.

Sum it up: Making code quality data-driven and trust-worthy

Refactoring code at scale isn’t free, (although new tools aim to simplify it). But being able to quantify the gain is key to getting non-coding stakeholders to understand the trade-offs: we can either invest in improved quality and become measurably faster in the future, or we can ignore it and expect to become slower over time. Making these conversations data-driven makes all the difference.

This also means that if you look to make a business case for code quality, then using Code Health – a validated code quality model – is key: without a connection to business benefits (faster, better), a “code quality” measure would just be a vanity metric, waiting to be sacrificed on the altar of deadlines.

That said, the principles discussed here apply universally: by understanding hotspots, targeting improvements, and linking quality to demonstrable ROI, you can turn technical debt into a business opportunity. Ready to start?