Predict security vulnerabilities with behavioral code analysis

Security vulnerabilities correlate with low code health, development hotspots, and a high author churn in the organization.

In this article we look deeper at the findings and show how you can measure these critical facets of software on your own codebase.

Code Quality should be a Business Priority

It’s easy to dismiss code quality as something technical. Something that’s a responsibility for the development team. That’s a fallacy – code quality needs to be a priority for the business too:

Business risks and customer impact: Low code health leads to technical debt where the average organization wastes 23- 42% of their development time due to technical debt.
Security vulnerabillities: Recent research shows that low code health leads to a high number of total security errors!

While an organization can accept low developer productivity, security vulnerabilities are a show stopper. Or let me put it this way: if low developer productivity means a slow death for your business, a data breach could be a sudden death. Let’s look at the data behind it so that your organization can act early and prevent future risks.

Security Errors: Code + People

My previous blog posts have covered the productivity and business impact of code health issues, so let’s focus on the security aspect here.

A recent study (“The Presence, Trends, and Causes of Security Vulnerabilities in Operating Systems of IoT’s Low End Devices” (2021) by Al Boghdady, Wassif, El Ramly) investigates the presence of security vulnerabilities in some of the most popular open source IoT codebases via the CodeScene analysis platform. What’s fascinating is that the study didn’t just look at code, but also human factors like team experience and evolutionary properties of the code.

To really understand vulnerabilities, we need to understand the whole system; making sure you have the right people and teams with active mastery of how the system us built is vital. The study confirms this by identifying a “strong negative correlation between security error density and the Qualitative Team Experience”. In other words, the more experience the team has in the domain and codebase, the fewer security errors.

Organizations that fail to build system mastery have more security errors (graphs via CodeScene).-2

Organizations that fail to build system mastery have more security errors (graphs via CodeScene).

It’s worth pointing out that while there are several code scanning tools, organizational factors like system mastery are invisible in the source code itself; using a behavioral code analysis tool like CodeScene allows us to see this invisible dimension.

The study also found a direct link between the code health metric and security issues: “CodeScene shows that the low Code Health of IoT OS leads to a high number of total security errors”. Code health is an aggregated metric designed to classify code with respect to correctness and ease of understanding. The study found that the violation of certain code health properties like Brain Methods, DRY violations, Bumpy Road, and Developer Congestion also lead to high numbers of vulnerabilities.

Finally, combining code health with the temporal dimension of an evolving codebase gives another powerful indicator: “CodeScene also indicates strong positive correlation between security error density (errors per 1K SLOC) and the presence of hotspots”. Hotspots are complicated code that the developers have to work with often:

Hotspots are complicated code that the developers have to work with often, which correlates with security issues.

Minimize the risk for security vulnerabilites

The research study used static analysis tools lie Flawfinder and RATS to identify known vulnerabilities in the source code. Static analysis tools are useful and highly recommended as they can catch a bunch of issues early in the development cycle. Complement the static checks with a behavioral code analysis tool like CodeScene in order to:

Reveal critical development hotspots so that these can be supervised and refactored if/when needed.
Enable automated code reviews that catch trends in code health decline, so that you can improve vulnerable areas of your code pro-actively.
Build awareness of system mastery and mitigate off-boarding risks guided by data.

Get a detailed, automated code review on each Pull Request to avoid critical code health declines.-1

Get a detailed, automated code review on each Pull Request to avoid critical code health declines.

But perhaps the main advantage of the CodeScene platform is that it lets everyone – developers, architects, non-technical stakeholders – share the same view of how healthy the codebase is. That way, you can communicate in a context where everyone working on the product has the same situational awareness.

Try it out

CodeScene’s analyses are completely automated, and the tool is available as an on-premise version or as a hosted service at CodeScene Cloud. It is easy to create and set up a free account or a paid plan for larger projects, and try out CodeScene. It is free for all open source repositories and (very) affordable for closed source projects.

Predicting Security Vulnerabilities with Behavioral Code Analysis

Code Quality should be a Business Priority

Security Errors: Code + People

Minimize the risk for security vulnerabilites

Try it out

Keep reading

CodeHealth as a Prerequisite and Compass for Coding Agents

Markus Borg

Announcement: Deterministic PR Refactoring Agents

Adam Tornhill

Unhealthy code is burning your token usage - here's the data

Adam Tornhill