Quantitative security metrics without numbers
Quantitative security metrics without numbers

Cybersecurity presents a conundrum.

Because security is so complex, we have created a set of operational metrics to put a warning system around the problem, but are we keeping score in the right game? The limitation of focusing exclusively on metrics is that you can't measure everything, and you may not be measuring the right things or in the right context.

Your adversary is expert, patient, has a plethora of strategies at his disposal, and he only needs to be right once to breach your defenses.

Metrics have worked in security because it is easy to count something; at the same time, it is much harder to formally describe exactly what is being counted.

Today's metrics also don't tell us anything about what is not being counted.

Why is the understanding of what is being counted important? Because experts aren't the only ones who must make decisions based on this information, and interpretation errors could be catastrophic.

Organizations use numerical systems primarily to categorize objects into actionable categories. For example, age acts as a numeric system for “legal to drink;" credit scores categorize applicants into “worthy of a loan." Numeric systems are a great solution to simple problems, but they are not the only method of classifying information into meaningful sets.

Let us look at the practical application of security metrics. Given a general vulnerability class, it is easy to end up with so many members that the data is no longer actionable.

CVSS (Common Vulnerability Scoring Systems) is a valuable security standard that uses common attributes to rank vulnerabilities into subclasses that are more actionable, but like all multivariate ranking systems, it has limitations we must accept.

CVSS provides a range of a 100 points, which we can view as subclasses. In theory, this level of granularity will work, but in practice, it is often too coarse or too fine because things are never distributed equally across the available range.

In any larger enterprise, ranking systems begin to break down fairly quickly. If you have 100,000 endpoints, you end up with too many vulnerabilities with the same score. The distribution is never even, and it is quite possible to end up with too many 9s, for example. What do you do when you have 30,000 9s?

Another drawback of scoring systems is that they look at vulnerabilities individually. Some of today's threat profiles combine multiple low-scoring vulnerabilities in series, and there is no way to compute the score of this composite vulnerability. The inter-relationships of vulnerabilities to each other and to the business is often more important than the vulnerabilities themselves when computing the risks they present.

To cope with the increasingly complex threatscape, security metrics need to evolve in ways that allow us to share the meaning of the things we are counting.

Semantic standards from the World Wide Web Consortium (W3C) such as RDF, OWL and SPARQL, make it possible to use reasoning engines to compute membership to sets in ways that offer a richer, more complex problem description. These tools also make it possible to leverage the formalism of logic at a more fundamental level than numeric systems.

The power of semantic technology is the ability to explicitly model what should be inferred given some relationship. By using this power, we can finally serve many different viewpoints in an organization without compromise.

Semantic technology has the same limitations as human cognition in that stable semantics improve the utility of the model.

In our society, maiden names are always associated with women, and I also know that a woman is married if she has a maiden name. Based on an assertion that Jean has the maiden name Jones, I infer two new pieces of information: 1) Jean is married and 2) Jean is a woman.

There are stable semantics within every part of an organization that can be leveraged to compute compliance and risk; we just need to put them to use. If we apply the power of inference to security, we can build information rich models that rely on relationships and context to compute risk. 

For example, we can describe servers as members of sets like “secret,” “public,” and “compliant.” Then, it is possible to apply a model that computes the membership to a certain set given its relationship to other classes and variables. 

For instance, if a file system for a server suddenly has company confidential information on it, it will immediately become a member of the “secret” class; if it is both a member of “secret” and “public”, we have big problems and it should become a member of the “take-immediate-action” class within IT operations.

This model-based approach to networks, vulnerabilities, and security threats makes it possible to compute complex security risk in a much more actionable way, and, more importantly, it also allows us to compute the risk unique to a specific network and/or enterprise. This approach also allows us to model security problems at higher logical levels for members of the enterprise that are not security experts.

W3C semantic technology has been successfully applied in many other domains that face problems similar to security: multiple technologies, multiple data feeds and multiple viewpoints and variance in the level of domain expertise, but adoption is still slow.

In some ways, the adoption cycle is similar to when relational databases threatened hierarchical databases; it requires a change in technology and more importantly a change in the way we think about the problem. 

Information security presents a unique challenge because we face adversaries that actively exploit our weaknesses. Numeric qualitative methods work for some situations, but semantic technology offers new solutions to complex security problems that are not resolved by today's multivariate ranking systems.

For expanded commentary, please check out this video.