Predicting Code Quality Issues Before They Happen: A Minority Report for Code
The earlier we can catch any code quality issues, the better. With short feedback cycles it’s both easier and cheaper to address any issues. Typically, code quality issues are caught by tooling in a CI/CD pipeline, during a code review, or – on some occassions – via code checkers integrated into our IDEs. That’s early.
But what if we could have even shorter feedback loops? What if we could predict future code quality issues while that code is still a spark in the eye of a developer? That is, before the code is even written. Such a super power would be a true game changer. So follow along as we make it happen and predict the future by using machine learning on code.
A Code Quality Case Study: Predicting future Code Health decline in Docker
Empear recently published a short video interview where I talk about the forensic psychology roots of behavioral code analysis. Towards the end, I’m asked where we will go from here. I reply that we want to be a (benevolent) minority report for code; we want to predict the future.
The reason I said that was because I was aware of the prototypes we have in our lab. I’m delighted that I can now reveal one of these new behavioral code analyses. Let’s use Docker as a case study to demonstrate what a code quality decline might look like.
A Docker Hotspot that declines in Code Health
I regularly analyze well-known open source codebases. When analysing Docker, I noticed that the
daemon.go module has been a development hotspot for years. This is unsurprising given that the
daemon.go is a central part of Docker. However, its complexity trend is alarming:
The previous figure shows that the hotspot was refactored in 2016, but since then the code complexity has climbed back in. Specifically, there’s a steep increase starting in 2017. Would it have been possible to predict this complexity increase? Yes, let’s see how it looks.
Predicting Future Complexity Increase
First we would need some kind of baseline for code quality. In CodeScene, we use the Code Health metric. Code Health is an aggregated metric that goes from
10 (healthy code that’s relatively easy to understand and evolve) down to
1, which indicates code with severe quality issues. The code health properties are chosen based on research, and known to correlate with increased maintenance costs and an increased risk for defects. The analysis platform then weights and normalizes the findings according to baseline data to come up with the code health score. With that covered, let’s return to Docker.
The nice thing with version-control data is that it’s easy to travel in time. So armed with CodeScene’s new predictive analyses, I rolled back the Docker Git repository to 2017 before the rise in code complexity and ran some analyses. Here’s what the analysis results looked like in 2017:
Wow! So that future complexity growth in
daemon.go was predicted already back in 2017, which means it could have been prevented. I’ll talk more about how that is possible soon, but let’s cover why this is interesting and useful first by talking about how we eat our own dog food at CodeScene.
For the Love of Dog Food
At CodeScene, we developers are using CodeScene ourselves. That means we often see opportunities for new features. You know, like
hmm, if I had this data point too, then I could answer that specific question quicker. Those thoughts are then fed back into our code and get implemented in CodeScene.
The code health decline prediction came into life through that process. Larger code quality issues and significant technical debt is hard and expensive to act upon. As a consequence, we frequently noticed that once a hotspot has declined in code health it tends to stay that way; the cost to restore its health will always compete with more pressing immediate concerns, quite often the drive for new features.
This means that anything we can do to provide an early detection mechanism would be valuable. That way, an organization can do pro-active refactorings while they are still affordable and avoid preventable future maintenance headaches.
Predicting declining Code Health: How it Works
So how does CodeScene pull off its predictions? Black Magic? Unfortunately reality is slightly more mundane. But not by much. Basically we have accumulated lots of historic data from real-world codebases. This makes it possible to apply algorithms and machine learning to pick up patterns. We guide the pattern selection process with our domain expertise; we have analyzed hundreds of codebases over the past years, and built a decent understanding of how code evolves.
For example, a module with low cohesion and too many responsibilities might stabilize. But combine those design smells with heavy developer congestion and the potential problems can quickly grow into a real maintenance nightmare. Similarly, a complex method might be something we can live with, but if that code is a knowledge island and the only developer who understands it leaves at the same time that new features get implemented in that area, then things can turn south. Quickly.
A common theme across our predictors is that it’s rarely some property of the current code that causes a decline in quality. Rather, it’s a combination of what the code looks like together with organizational factors like overlapping team responsibilities or – often – long-term trends that start to accelerate. We couldn’t have done these predictions without data on the social/organizational aspects of code and its history.
Explore More and try CodeScene
I hope you are as excited about this new CodeScene feature as I am. We have used the code health decline predictions internally as part of our services, and detected several examples on how CodeScene finds real, growing problems very early. The effect has been spectacular. We don’t catch all future issues – that would be a true precog – but the one’s we catch are relevant.
These predictive capabilities arm development organizations with the superpower of acting on quality issues at a stage where it’s affordable and relatively easy.
You can also check out the product reviews to see who else is using CodeScene and the value they get out of it.