We know what “fake news” is. It's a rumor treated as news with no validation. Some of this fake news is intentional. Yet, the vast majority of fake news stories are rumors from bad sources or from a single source lacking knowledge of the truth. And when the reporter feels the rumor fits his perspective of reality, he broadcasts it as real news. The old belief of requiring two confirming sources no longer matter in a world of speed and bias.
Security automation is fake news. In the venture capital-backed security world of pushing out unvetted solutions into operations, much of what is called security automation is fake security. It's actually automation made from a rumor. Automation that kicks off a process based on an unsupported fact. The logic that automating a response can save a company has a corollary that automating a mistake will destroy a company.
Security operations have procedures that can be automated. But complete automation is impossible, as much of the decision-making process and all the business awareness is inside the brains of analysts. Automating procedures, watching them fail, and stepwise correcting them is not the perspective of a Chief Security Officer. The Lean StartUp approach of “Fail fast, Fail often” is not something that security can do.
Security Automation and Orchestration (SAO) exists to automate the response to issues that prevention failed to address. An automated response to what the prevention refused to do, is an override to the logic that made that product work correctly. It's like bypassing the safety stop on your garage door because it is not closing all the way. It takes human awareness to resolve the problem when initial automation fails. Bypassing the design means bypassing the safety features.
It is a business approach to accept risk, but it is security's job to reduce it. SAO implementing responses that products refuse to implement increases risk in order to reduce workload.
Here are the arguments for automated response:
- It saves time
- My staff is overwhelmed with alerts
- I do not have enough staff
These are the common needs that SAO wants to resolve. But does the SAO approach really resolve them?
What Time Are You Saving?
Can you take a couple extra minutes to avoid a major mistake? On average, there are more than 200 days between the detection of a security event and the response. Automation aims to change days into seconds. Would you settle for changing from days to minutes if that's what it takes to do it right? Have a person review it, and make the proper adjustments? Isn't that worth the extra effort?
I am not making the argument that prevention is bad. Just the opposite. I am making the case that if prevention could be automated directly from the alert, the prevention product would have already done that. We are talking about responding to an attack that has made it through all the corporate prevention, and that some aspect of the compromise has triggered the need to respond. Security automation is not prevention; it is a very fast response.
The issue with SAO implementing a control action, such as a block, is that it is doing it outside the logic of your prevention tools. Thinking that automation will prevent what all your other prevention has failed to do is a foolhardy belief.
Let's take a look at a common marketing discussion, using VirusTotal as a common example of why SAO is needed.
VirusTotal Automation Playbook
The most common example of automation is integrating with VirusTotal. The process is simple. A file is captured on the network and sent to VirusTotal, VirusTotal says it's bad and automation sends a response to the desktop saving it from what local endpoint protection (EPP or antivirus to the layman) missed.
Even VirusTotal would say that is not the intended use of it. VirusTotal says in its FAQ, “[VirusTotal] it is by no means a full-fledged antivirus and we do not want it to be …” VirusTotal is a second opinion, and not meant to be a means of primary detection. The reason is clear to anyone who has reviewed a significant amount of results on VirusTotal: false-positives.
The odds of one engine getting it wrong is quite high. If you use VirusTotal as a voting algorithm to avoid false positives, then you are going to miss new variants and advanced attacks. Lower the number of engines needed for a verification and you are going to most likely detect false-positives more frequently than new attacks. Furthermore, there is much in the naming of the viruses that a human can understand – the differences between a generic alert, adware, spyware and bots.
Lastly, intercepting a file, reviewing it in with antivirus (or better, a sandbox), is already baked directly into products without needing to integrate. Numerous vendors all have these types of capabilities both on the network and callbacks from the desktop. By the way, viruses moving across the network will look different than ones installed on a system, which is the default VirusTotal scan configuration. More importantly, the antivirus products you own are in prevention mode, unlike automation. If you want to detect the cutting edge, endpoint detection and response (EDR) products will typically work better than VirusTotal and do so with prevention.
VirusTotal is powerful when used correctly, but weak as an automated response mechanism when compared to fully integrated prevention solutions. The idea that an administrator can write a python script on the weekend that out performs years of research and programming is a stretch.
Lesson in this Story
An SAO playbook using VirusTotal to implement a response is a complex solution, which is inferior to EPP and EDR. Yet, this is a common SAO example. It seems logical in a blog or marketing example. But one of the most common VirusTotal false-positives is alerting on antivirus update files. Something the SAO example would wind up quarantining and blocking. It's one reason you don't run two antivirus engines on the same machine due to one security product simply blocking the other.
I sat at a Gartner conference in May and listened to a government contractor proudly claim they process thirty million alerts a day. Counting all the incoming log messages is a vanity statistic. The reason for it being a vanity statistic is that it does not measure progress. One can send all their flow data to a log manager and get hundreds of millions of events. It does not even let you know how much of the network, users or hosts you can see with the logs.
People constantly look at log analysis the wrong way. Instead of trying to collect everything, ask yourself what data you need. To do this, determine what alerts are meaningful to you and what information in those alerts you are capable of leveraging. It's like buying random spices and ingredients without any idea of what you want to cook, or what you are capable of cooking.
Instead of counting how much data you are collecting as your primary metric, look at where you are trying to get something done and see what steps and data are missing:
system recovery <- determine asset <- validate infection <- determine infection
When an analyst is looking at all the data for systems that have been compromised, he is not looking at all the alerts. And when he gets an alert that says a machine has an infection there are a number of alerts and derived data (remember this) that he uses to confirm it. Once confirmed, he needs to look up what asset had that address at that time. And finally, depending on the type of inspections he performs, there are different responses.
There are a number of metrics and types of data that matter:
- Determine the type of infection (adware, worm, bot versus ransomware)
- Know what asset is assigned to what address
- Validation information (derived data or secondary alert)
- The key event, an alert that states an infection
The most difficult part of this is validating. If you spend all your time counting the information you are collecting, you are not resolving an issue. The most common way to validate an infection is to determine if there are unusual connections or a scan out (volume of connection to other systems). This action of validation and evaluating all this raw data about connections is the issue.
Most of the raw data is what drives up the number of events. We still want this data, for it is useful for analytics and insight. This data is used for powerful results. However, it is a significant amount of data and not really useful for humans until there is something to investigate.
The issue is not handling all these alerts, for they are not all alerts. The issue is managing all this data and making decisions from it.
If there are only a couple infections a week per office, automating a response is not the issue. It is assigning the analytics and making a decision. It is also making sure you have all the data to make that decision or perform analytics.
Most organizations do not have too many alerts. They have too little insight into their network and are unable to perform the actions to resolve alerts. It's easier to blame the workload. Workload could be an issue, but counting all your data is not useful and not helping you get things done.
You are Not Getting More People
The question of scaling people is assessing the amount of new support, skills and complexity against those that SAO reduces. The push and pull here are:
- Time saved by automating actions
- Time loss by new procedures, skills and support
- Time lost in correcting automated mistakes (rescinding actions)
- Business lost due to automated mistakes (work lost to actions)
Is automation the answer, or is scaling your talent the answer? The difference is that scaling talent means making their job about making decisions as much as possible, and having those decisions replicated. Every time one of your analysts use cut and paste, they have done something unproductive.
Even without Machine Learning (ML) and Artificial Intelligence (AI), computers can augment a staff's workload. They can store, search and derive data fast and without the human aspect of error from boredom. They can perform endless analytics and are not offended when you do not use them. With ML and AI, computers can categorize and predict, finding trends in data that are difficult for people to identify.
As of today, none of what a computer does in security demonstrates talent and skill. Computers do not make the intuitive leap between the asymmetrical goals of security and business. Computers execute the plan. As Publilius Syrus said, “It is a bad plan that admits of no modification.”
Human resources are limited and valuable in security. What makes them valuable is not the grinding of going through data and alerts. What makes them valuable is their insight and decision-making capabilities.
It should be obvious by now that computers set them up and people take them down. It also should be obvious that in this relationship the most important aspect of people is their ability to make decisions. The immediate power of today's analytics includes computers attempting to make a decision while humans oversee and correct those decisions.
X Marks the Spot
In my first job out of college, I reviewed technical documents. I needed the job. I remember having to read the acronym appendix to a military transportation system for correctness. Even acronyms have acronyms in the military. Listed under the Ms was “MARGARITA.” You need one if you have read this far. Strangely, it was a life lesson that taught me you need to grind through something to get it right. So, you get it – automation can be bad. So, when is it not? Let's wrap this one up.
A man retired from a factory gets a call that the assembly line broke down. Being an expert, they ask if him to come in. He strolls into the factory, chalks an X on the side of some machine and asks one of the big guys to strike it. A loosening sound is heard and the assembly line starts moving again. The man writes down a bill for his visit. The manager asks, “That much for an X?”
The retired man says, “That much to know where to put it.”
Automation is awesome. It does save a significant amount of work, time, and tedious effort. If your job routinely requires you to control-c and control-v, you will be replaced by automation.
This is the gist about automation, the most important aspect is the decision that starts it. A good decision, and having good information, is the key. The real news is valuable and useful. It might not be the news we want, but it is the news we need.
Orchestration is the automation implemented by a trigger. In SAO, this trigger needs to be a quality decision. That's why I say Security Analytics and Orchestration instead of Security Automation and Orchestration.
Conclusion: Decision Making is the X-Factor
We started with fake news. It happens because producers and editors of the news bypassed their usual safety procedures. They bypassed the quality check of confirming sources. The decision to move forward was actually the lack of a decision. It was an immediate action to get a result without determining if the action was correct in the first place.
SAO has promised many of those claims, which SIEM tools make, such as scalability, force multiplier, and efficiency. But is SAO something different when it comes to what it does? Some vendors can sandbox and respond faster to malware. Web and DNS filters can remove bad sites before they show up as an incident. Better filtering and removal of shadow IT will remove more alerts and reduce risk. Every example of SAO has an integrated product solution that does it more efficienty, faster and just plain smarter. It seems SAO is presenting a solution to a problem addressed elsewhere.
What the SAO playbook needs are smarts. Real smarts, not just marketing saying that it's “smart.” There needs to be a reason to bypass the logic of security prevention tools. This reason is that at a higher level of correlation and network vision, there is an awareness that allows a quality decision to be made.Making decisions is the SAO X-Factor. When SAO can make situational decisions, it will become powerful. It's the ability to determine why prevention failed, correct the problem, and implement a response. Every sales person knows to “Start with Why.” Every SAO needs to do the same.