Better Than Silver Bullets: A Milestone for Behavioral Code Analysis
My work on code analytics started 10 years ago, and the CodeScene analysis tool has been my main focus for the past 4 years.
CodeScene is the first real product built around the concept of behavioral code analysis, which is a radical departure from traditional static analysis techniques.
Over the past years, I have spoken at a ton of conferences, written two books, and published several articles and the occasional research paper on behavioral code analysis. I’ve done my best to popularize the field.
At times, this has been a lonely journey, so it’s great that more people and more companies are joining this community.
The most recent addition is GitLab that is now entering the behavioral code analysis space. This marks a milestone for the field, and a validation for me personally that what I have been claiming for years makes sense to outsiders as well; behavioral code analysis is as close as we get to a silver bullet for making sense of large-scale codebases. The level of insights and the speed with which we get them continues to fascinate me. Behavioral code analysis also has a clear advantage over silver bullets: it’s real.
GitLab’s entry also provides validation for the CodeScene tool. We never took any venture capital, but decided to build a great product first to prove the value and business model (read the full startup story here). Since then, CodeScene has grown into a tool suite that’s used by organizations around the world; thousands of people are using CodeScene in their daily work on large-scale codebases.
However, to serve a growing community, we need to focus around a common vocabulary that clarifies the concepts. Let me explain.
What is Behavioral Code Analysis?
Behavioral code analysis identifies patterns in how a development organization interacts with the codebase they are building. That is, while the properties of the code are important, there’s even more value in learning how the code got that way and where it is trending. This information is used to prioritize technical debt, detect implicit dependencies that are invisible in the code itself, and measure organizational factors like knowledge gaps and support on- and off-boarding.
There have been tools in this space before CodeScene, like delta-flora by Michael Feathers, my own open source tools, as well as research tools like the Evolution Radar. All of these have influenced CodeScene.
From its inception, CodeScene has built on research and has been the topic of research itself, such as comparisons to static analysis.
Growing a Community
We often joke that naming is one of the hardest problems in software. The reason those jokes are fun is because they are true. Naming is hard. The naming problem is there for a product in an evolving field too. I know, since I have made my fair share of mistakes.
I’m responsible for most of the names and concepts that you find in CodeScene. Some names are new to CodeScene, others are lifted from my books or academic research papers. What follows are some examples on how ill-chosen names cause unnecessary confusion:
- Temporal Coupling: The purpose of this analysis is to detect co-evolving modules that are modified together as part of the same logical change. The coupling analysis is my personal favorite, and I use it for a myriad of purposes, for example to reason about change impact. But the name isn’t well-chosen, and we now prefer to talk about Change Coupling. I explain why in Software Design X-Rays:
Change Coupling Both Is and Isn’t Temporal Coupling
In my previous writings--and occasionally in the tolling - you may come across the term temporal coupling instead of change coupling.
This is unfortunate since it overloads the term. The fault is all mine; I chose the temporal coupling name - unaware that it had a previous use - te emphasize the notation of cochange in time. In its original use, temporal coupling refers to dependencies in call order between different functions. For example, always invoke function Init before calling the AccelerateToHyperspeed method or bad things will happen. This kind of temporal coupling is a code smell and is discussed in The Pragramatic Programmer: From Journeyman to Master [Ht00].
Excerpt from Software Design X-Rays to explain the changed vocabulary
- Abandoned Code: This analysis uncovers any knowledge gaps that we might have in our codebase due to code written by former contributors. You know, the kind of code no one else has worked on and, hence, is more expensive to modify since that requires learning unfamiliar code. In my books, I call this knowledge loss, but we now prefer to measure the inverse: How high is the System Mastery of a particular module?
- Inter-Team Coordination: I’m fortunate to have a team of true world-class experts on CodeScene’s advisory board. I’m also fortunate in that all of them – as opposed to me – happen to be native English speakers. That means they can call out some of our naming issues, and “Inter-Team Coordination” is one of those. We now prefer Team Coupling since the term more accurately describes the situation where multiple teams need to work in the same parts of the code, which often indicates organizational or architectural issues.
My initial thinking is continuously evolving as I learn more by working with others in the community. I will continue to share those learnings. After all, behavioral code analysis is a young discipline, a new generation of code analysis, and leading the way means educating new users and encouraging them to explore the space. When it comes to exploration and learning, a clear and consistent vocabulary is paramount. Join in and welcome to CodeScene!
CodeScene in action: within minutes, the analyses let you build a mental model of a previously unfamiliar codebase.