Creating Information Security Performance Measures: Five Recommendations for CIOs and CISOs
Chief information officers (CIOs) have developed reliable performance measures for most aspects of their job. For example, anyone who has worked on a help desk or managed a network knows that there are specific performance expectations related to response time, cost per unit, and efficiency. These performance metrics are quantifiable, relate to actual dollars and cents, and correlate to enterprise objectives of situational awareness and continual performance improvement. But information security presents a more complex measurement challenge.
I believe that CIOs and CISOs indeed can measure the actual operational performance of their information security investments, and the linkages among performance, cost and benefit. For CIOs who wish to close this gap in their performance management, part one of this article will provide the business case for security performance measurement. The second half will develop five recommendations for achieving a sound business and technical architecture for information security performance measurement.
Why Does the CIO Need Security Performance Measures?
The Situation is Getting Worse
Security problems continue to present themselves at an alarmingly faster rate. For example, the number of information security incidents reported to US-CERT has increased exponentially during the last three years:
• 2002: 490,000 events reported
• 2003: 1.4 million events reported
• January to June 2004: 56 million events reported
According to Jose Nazario at Arbor Networks, "The speed with which defenses need to be established only grows as time goes on." Malicious programs such as Code Red took a few days to reach its peak levels of infection. New worms, such as Sapphire, traversed the entire world in only five minutes.
The intensity level at which these problems are occurring has become an enormous distraction for most areas of IT.
• CIO Magazine, citing an Intel white paper, states that in 2002, Intel applied over 2.4 million software patches. Bug fixes are being released on average every 5.5 days and the time to react to vulnerabilities is getting shorter .
• The urgency of the operational requirement to apply software patches is increasing too. According to the Symantec Internet Security Threat Report, 39% of vulnerabilities are exploited by hackers within 0 to 6 months of discovery, and 64% within 0 to 12 months of discovery .
• Research firms such as the Aberdeen Group, examining the threat and vulnerability management space, also concur: "Dramatic increases in vulnerabilities, exploits, security updates and patches have outstripped the ability of most organizations to keep up with information flow, analysis and necessary responses. "
The bottom line here is cost and measurable benefit of security activities and investments. A new measurement paradigm is necessary. Organizations must be able to establish a baseline of acceptable risk, and find ways to monitor and manage the risk baseline to predictable levels of deviation. Using this proposed approach to operational security management, there is an opportunity for real cost savings, tangible benefit, and a reduction in the distraction factor.
What Should CIOs Measure?
Due to the lack of a meaningful risk baseline and performance data, information security organizations traditionally have not been held to the same stringent performance metrics as other areas of IT. For example, a network administrator is paid to provide service levels related to uptime, latency reduction, and cost per gigabyte. A help desk manager is rated based on number of "first-time-final" calls, and factors such as queue hold time. In most organizations, information security performance is not measured in such a specific and quantifiable manner. The selection of an analysis model for information security performance is the first problem CIOs need to solve.
When used to obtain an analysis of information security investments, typical IT return-on-investment (ROI) models are incomplete, inaccurate or inappropriate. For example, classical risk management doctrine requires the presence of certain variables in order to calculate risk and return levels. At a minimum, in the risk management equation, the governing cost/benefit factor of Annual Loss Expectancy (ALE) is required . Some organizations, such as peer working groups and research firms, attempt to provide a snapshot of the ALE variable.
For example, the Computer Security Institute (CSI) and the U.S. Federal Bureau of Investigation (FBI) collaborate on an annual survey of computer crime and information security. Although the 2004 CSI/FBI survey describes annual losses for the prior year of $141 million, there were only 491 respondents to the survey . Using simple math, these results extrapolate to imply an average ALE per respondent of $287,000. The problem with this number is that there is no correlation between organizational revenue size and amount of loss because respondents varied in total revenue from under $10 million (20%) to over $1 billion (37%). Other research suggests that the total amount of losses due to security problems is much higher than stated in the CSI/FBI survey. For example, damage from the Code Red worm in 2001 was estimated at $2 billion alone . A better ALE factor would have divided actual loss by total revenue, creating ALE as a percentage of revenue. The net result is that the lack of a reliable set of industry and organizational-size-specific ALE numbers leaves a gaping hole in the risk management equation.
Five Recommendations for CIOs
This next section of this article proposes a set of recommendations that comprise a model for measuring the performance of security operations using data that readily exists within most organizations.
Recommendation #1: Establish a Risk Baseline
To build a meaningful performance management framework for information security operations, we must start with information risk variables we can measure in an operational context, which are:
• Threats – Entities that will exploit a gap in cyber defenses, such as internal/external hackers, foreign governments, terrorists, etc.
• Vulnerabilities – Actual "chinks in the armor", such as software bugs, mis-configuration, lack of attentiveness.
• Asset Value – The value of an information asset relative to others.
• Time – The amount of time from problem recognition to resolution/closure.
• Cost – Savings or loss resulting from performance and manipulation of these variables.
For this discussion we will assume that the systems we wish to measure are in a fully deployed and operational state, and we have built the system as securely as possible. Such an assumption requires that organizations follow standard approaches to integrating security into the systems development lifecycle, such as those found in NIST Special Publication 800-37 , or COBIT . The outcome of any comparable approach is that the system owner accepts residual technical, operational, and managerial risks that could not be resolved due to time, complexity or cost. This baseline provides the benchmark against which we evaluate our risk posture, and our ability to react in a timely and effective manner to deviations from the baseline in the future.
Recommendation #2 – Conduct Real-Time Measurements of Changes in Risk Levels
As time passes, security events occur that will cause this risk baseline to deviate upward from zero. For example, a new software bug (vulnerability) will be discovered and new patches and exploits (threats) will occur as a result. There also may be unauthorized configuration changes (vulnerabilities) or internal attempts to affect the confidentiality or integrity of the system (threats). Each asset that can be affected by threats and vulnerabilities has a tangible value to the organization. All of these factors together represent aggregate deviations from the risk baseline, and can be quantified using the equation Threat * Vulnerability * Priority = Risk Deviation .
The level of granularity for this depiction can reflect a high-level view, such as a management summary of enterprise-wide risk values, or can be changed to a higher resolution to focus on the risks within a particular community of interest, such as an e-commerce site or the information assets of a specific business unit. The goal of this baseline activity is to develop and activate sources for the key risk baseline data fields.
Recommendation #3 – Benchmark the "Mean Time to Repair" for Security Problems
Another key vector for measuring security performance is time. The actual amount of time from the moment an IT organization recognizes a deviation in risk levels to the time it can state that it has successfully returned the enterprise to the accepted risk baseline constitutes a powerful performance metric. The time it takes to fix a problem can be characterized as "mean time to repair" (MTTR). Improvements in MTTR from period to period can be measured, and represent improvements in the efficiency of the organization in detecting and responding to deviations from the risk baseline.
The cost variable is derived from a number of factors including, but not limited to, manpower utilized or optimized, and financial benefit or detriment to the IT asset in question, e.g., downtime. Each organization must work upfront to determine meaningful components of the cost variable. For example, in organizations where business continuity planning is a core IT competency, the asset values that are calculated as part of the business impact analysis (BIA) process can provide the cost/asset value variable data.
The goal of this exercise is to understand how well an organization reacts to security problems, and the direct and indirect costs of the current levels of activity and technology. This process will set the starting point from which and organization can measure operational security improvement and performance.
Recommendation #4 – Compare Baseline Information to Desired Outcome
For a period at the inception of a security performance management activity, the primary goal would be to monitor the variables discussed in this article, and record baselines and behavior for risk, time, and cost. In the periods that follow this initial baselining activity, the organization would set performance goals associated with the achievement of agreed-upon values for risk, time and cost deviations. From year to year, the management objective would be incremental performance improvement to shrink the deltas and to convert cost to strategic opportunity. Such a set of objectives would fit nicely into a global performance management framework such as the Balanced Scorecard , and would satisfy regulatory objectives in legislation such as the Sarbanes-Oxley Act by validating the efficacy of internal controls . In this example, "strategic opportunity" translates to the redirection of existing resources to new problems or challenges, or actual tangible cost savings.
Recommendation #5 – Use SIM Technology to Automate This Process
A periodic paper exercise such as certification and accreditation, or an IT audit will not help IT organizations establish meaningful performance metrics or measure success on a day-to-day basis. Technology and automation must be employed to gather the necessary telemetry described in recommendation number two, and to organize it into the performance management architecture described in the prior four recommendations. This technology is provided by a robust and mature security information management (SIM) platform.
To satisfy recommendations 1 and 2 in this paper, it is important to gather key asset-related information, such as asset type, location, hardware and software configuration and patch levels, asset valuation, and services. Much of this information can be obtained from two sources: a well-defined asset management activity and the work performed during a business impact analysis (BIA). This phase provides the raw data required to feed the cost variable, and the asset value component of the risk variable. It is also important during this activity to develop a security monitoring architecture that provides telemetry regarding network and system behavior for the assets.
Threat intelligence comes from sources such as firewalls, IDS, IPS, switches, routers, and server log files. Vulnerability telemetry is provided by network and host scanners, and other automated sources that report the security posture of IT assets. All threat and vulnerability information required by this model can be gathered via the SIM input interface, normalized and correlated by the SIM's correlation module. This information, combined with the asset information provides the risk variable.
Recommendation 3 requires measurement of the time during which a problem or incident is "open." The actual identification/detection of a change in the risk baseline starts the clock on the time variable, and the measurement of the effectiveness of the incident management and risk mitigation processes. IT managers execute critical activities such as problem resolution and risk remediation (e.g., configuration and patch management) according to the priority dictated by the risk variable, and in accordance with negotiated service level agreements (SLA) related to the time objective.
The clock stops ticking when the problem is resolved, and final costs or cost-savings are calculated. A help desk or problem management system facilitates tracking of issues from cradle to grave, and the measurement of the effectiveness of the security response and problem management processes. For organizations using problem resolution environments such as Remedy, an interface between the SIM and the trouble ticketing system is beneficial and efficient.
Finally, recommendation 4 requires a roll-up of reporting regarding the overall performance of the entire process. The CIO and CISO can generate reports within the SIM platform that provide detailed information regarding the impact of threats and vulnerabilities in their environments, and the effectiveness of the process to respond to and mitigate these problems.
Conclusion
"What you measure is what you get. " This article asserts that CIOs can and should insist on hard metrics related to improvements in day-to-day information risk levels, and time and cost-based efficiency related to security problem resolution and the management internal controls. Through the creation of a corporate security information management architecture, and the capture of key operational security performance indicators, CIOs can align information security performance measurement and management with the rest of I/T and the enterprise.
* Edward Schwartz is senior architect for netForensics