DHS report: Open-source code “quality” is up

A U.S. Department of Homeland Security-sponsored project has not only discovered that the quality of open source software code has improved significantly over the past two years, it has debunked a widely held assumption that longer function strings within source code are associated with an increased number of code defects.

The findings come as part of an ongoing three-year, $300,000 project between the DHS and source-code analysis vendor Coverity designed to help open-source software developers find and fix vulnerabilities in their projects. To date, the project has analyzed more than 55 million lines of code from more than 250 open source projects.

One of the notable conclusions from the scanning project was the 16 percent average drop in the number, or density, of defects detected in open-source projects, David Maxwell, Coverity's open-source strategist, told SCMagazineUS.com. While the initial average static analysis in defect density in 2006 was 0.30, or roughly one defect per 3,333 lines of code, the current scan shows a 0.25 rate, or roughly one defect per 4,000 lines of code.

Another point was the debunking of programmers' long-held assumption that writing longer functions, or code strings, just naturally leads to a greater number of defects.

"We found those to not be correlated," Maxwell said. "That goes against common expectations. A lot of programmers feel that longer functions contain more defects not only because it's more lines of code, but because it's more difficult to write good code as functions become longer. That seems to not be the case, and contradicts popular beliefs."

The project deflated yet another theory, as well: the bigger the software development project, the higher proportion of coding errors it contains.

"Another interesting comparison was the relationship between code base size [the number of lines of code] and the number of defects identified,” Maxwell said. "We found there's almost a 72 percent correlation between those numbers."

He said bigger projects contain more defects, but many programmers believe that as a project gets larger, the rate of defects increases -- that size not only introduces more defects, but they'll be introduced exponentially.

"Our analysis, however, shows the growth appears to be linear," Maxwell said.

The most common type of code defect among the 13 billion combined lines of code analyzed so far are null-point references, which made up 28 percent of those found. Resource leaks comprised 26 percent.

Both could cause security vulnerabilities in an open-source project, Maxwell said. They could cause an application to crash or a denial-of-service attack, among other problems.

Conversely, dynamic buffer overruns and unsafe use of negative values made up a mere 0.3 and 0.2 percent of the defects uncovered.

Coverity uses its Prevent static-analysis software to analyze the source code of each program in the project.