Big Data: The big picture
Big Data: The big picture

Making sense of it all

To understand how Big Data came to be, it might be wise to examine the evolution of Google's flagship product, its search engine. Some may credit the web giant's meteoric rise to dominance with its intuitiveness and clean interface, but what really made Google special was the superiority of its search algorithm to produce speed and relevance. Remember the early versions of AltaVista? Or Excite? They paled in comparison.

“It's so damn fast and it's so insightful that you take it for granted,” Jaquith says of Google. That had everything to do with Big Data, he says. Google developed a new way to do search by relying on non-relational databases and its home-grown MapReduce framework, which permitted the company to process queries against a massive number of distributed nodes. So instead of using conventional relational databases, Google was able to better scale and, in turn, instantaneously produce pertinent results.

“Big Data is just like the natural evolution of the fact that networks have gotten faster, bigger and servers can hold more things,” says John Kindervag, principal analyst at Forrester Research. “You just naturally want to put everything in it. If you have a big closet, by nature, you throw all your crap in the closet and sort through it when you want to…Once you have data, you can rule the world. Ask [Facebook founder] Mark Zuckerberg.”

Zions, in a way, is a microcosm of a Google or Facebook. Wood says that at the 30th largest bank in the United States, which counts nearly 11,000 people as employees and $50 billion in assets, applying a Big Data approach within his department is critical because security data “is different than the traditional data warehousing space.”

He says security assets are mostly unstructured and include things like firewall/anti-virus logs, packet captures, web log activity across internet banking and treasury management platforms, and login behavior on internal systems. But aggregating and analyzing that type of information wouldn't fly in Zions' traditional database management systems.

After it outgrew the SIEM technology, Wood says Zions needed to develop a more robust way to process data from its 130 different sources if it were ever going to draw any real, timely value. “Say you wanted to run a query across more than 30 days of data, you may be waiting hours for that to come back,” he says.

That meant, in 2005, building something called “multi-parallel computer processes,” which enabled the bank to leverage clusters of computers to aggregate and mine data. This enabled Zions to shed its reliance on security tools and start building its own internal models that could do the job as good – if not better – than paying huge sums of money to a technology provider.

Rather than continue looking for that latest security appliance to plug into his environment, he asked himself, “How can I leverage the data I already have to make a better business decision?”

William Ronca, executive VP of sales at Red Lambda, a security intelligence company based in Florida, agrees. He says many organizations deploy solution after solution, but none of them collaborate in any meaningful way.

One of those models Zions built out of the data it analyzed was to fight spear phishing abuse, in which certain people within a business, often executives, receive legitimate-looking emails that typically seek to install malware on their machine. It's a well-known social engineering ploy that has led to some high-profile breaches in recent years, including one last year at security firm RSA.

“You've got an organization getting millions of emails a day,” Wood says. “An attacker targets a handful of people and sends five emails in. How do you detect and respond before your employee clicks on a link they shouldn't?”

“None of these 15 or 17 or 20 tools are integrated together,” he says. “They're doing separate jobs in the hope they're securing the environment in some way.”

About two years ago, Zions needed even more scale, so it began leveraging an open-source product known as Apache Hadoop, an open-source tool inspired by Google's MapReduce and File System frameworks. The bank contracted with a small vendor that helped it develop a customizable, enterprise-friendly version of the product.

“What Hadoop is is a piece of technology that you can distribute across tens of thousands to hundreds of thousands computers, and it splits all that data and then leverages your cluster for storage and computing power,” Wood explains. “Hadoop is our core security data warehouse. It's our core Big Data repository.”

Zions is not alone. According to Ventana Research, which last summer polled IT managers, developers and data analysts across hundreds of companies covering multiple verticals, 54 percent are using or considering Hadoop for “large-scale data processing needs.” Big Data is also becoming more popular in the cloud – where it is well-suited considering the massive number of distributed machines necessary to generate actionable intelligence. Several major providers, as well as a number of talented start-ups, are offering Hadoop embodiments via the cloud.