From Big Data to smart data for security analytics

Have you heard the grand promise that Big Data analytics will reveal magical insights and enable organizations to transmute lead into gold? There seems a lack of depth to the idea and no concise plan in the roadmap. Big Data security analytics has the potential to be either a very effective solution or just another buzzword thrown around at conferences.

Security analytics solutions still leave much to the operator. For most organizations that means it won't deliver the goods without a so-called ‘Data Scientist' who is more than likely a statistician who happens to have Hadoop experience. Real security analytics demands a practitioner who has direct field knowledge of computer security, proving once again that human talent is indispensable. Problem is, human talent doesn't scale.

Big data has become more about storage than analytics. Many organizations think having a Hadoop cluster means they are doing Big Data. Without analysis though, that promise of magic is unrealized. There is comfort in knowing the data is there, but for most enterprises, useful analytics remains a distant station on the roadmap.

According to the Ponemon Institute, “Big data analytics in security involves the ability to gather massive amounts of digital information to analyze, visualize and draw insights that can make it possible to predict and stop cyber-attacks.” That's a mouthful but lacks precision.

The definition implies that enterprises need to vastly expand their storage networks and only then will the payoff come. Vendors will say that Big Data is about storage and that their generic tools will reveal previously unseen patterns for cyber defense.

Let me challenge this by saying:

Analytics tools are only useful to the degree that they incorporate the knowledge of seasoned experts.
Storing more data is not only expensive, but counterproductive.
Focus on smart data, not Big Data.

The message is clear, you only need what is relevant. Detect malware? Try examining how software behaves at an endpoint. Most malware can be detected just by looking at how it installs itself on the system. APT movement? Try looking at user profiles at the endpoint. Cracked/misused accounts will stand out from normal behavior. Hoarding packet logs will not get you these answers.

Need for security domain knowledge

To have successful analytics you must leverage domain knowledge. If you want to automate threat detection, then automate how an expert examines an endpoint.

Intrusions and compromises follow specific patterns, bounded in their shape by the rules that govern operating systems and software. This a limited set. An experienced cyber investigator will spot these. Also, malicious things tend to look different than everything else. They have properties that make them stand out from the baseline. As shown in the example below, a binary running out of the recycle bin is abnormal and any investigator would identify it as a potential threat.

This would get the analyst's attention.

Increase productivity via automation

Automation will not replace cybersecurity jobs, but repetitive investigative tasks can certainly be automated. When confronted with a potential intrusion, an investigator will consistently look at a small set of things. Then, based on the results of this initial look, he or she will head down a variety of paths. At the end, “Pwneage” or “Rabbit Hole.”

I posit this initial investigation can be (largely) automated. If an actual intrusion is detected, then automation stops. It's at this point where the wisdom and creativity of the investigator comes to play as they attempt to build the story around attribution and impact.

An automated security analytics system must replicate what an expert cyber security investigator would do. Useful analytics requires that smart data be collected from endpoints throughout a network. The analytics must reason over the data with domain-specific security knowledge in order to identify meaningful patterns indicating true security events.

In short, the goal of your Big Data initiative should not be collecting vast amounts of data in the hope that your attack data will somehow fall out of the mass. It should start with an intelligent look at collecting smart data and analytics tools to allow security experts to be more productive.