Consumer behavior often is influenced by slick marketing tactics, like jingles that may make you want to pull your hair out. Case in point: If you’ve recently walked into a Subway to order a sandwich – and haven’t thought about five dollars and footlongs – you probably don’t own a television.Well, the latest craze in information security doesn’t have an indelible tune to go along with it – at least not yet – but it does have a memorable, sexy-sounding name: “Big Data.” And as a result, everyone is talking about how organizations are aggregating, searching and analyzing voluminous information sets to make intelligent business decisions that may have been impossible to reach in the past.
“It’s a great phrase that has captured the imagination,” says Andrew Jaquith, chief technology officer of Perimeter E-Security, a managed security services provider based in Connecticut.
But for Preston Wood, the chief security officer at Zions Bancorporation, the parent company of some 550 bank branches in the western United States, the concept of Big Data isn’t anything new, only that there is a buzzword now to describe what Zions has been doing for more than a decade. In the late 1990s, the corporation began recognizing the enormous business value that could be generated from aggregating disparate data sets and drawing connections to glean actionable insight.
The company was an early adopter of security information and event management (SIEM) technology to make sense of its data sources. Some consider Big Data to be the next generation of SIEM.
“We had a Big Data strategy before Big Data was Big Data,” Wood, 40, recalls. “We thought, ‘How great would it be to take a lot of this unstructured data we have – that we are retaining for various reasons – and put it into a form factor to be able to analyze and mine that data to make better security decisions?’ You’d be able to start some fascinating analytics. You’d be able to ask questions of that data that you weren’t able to do in the past.”
If Zions thought it was dealing with large amounts of data that needed processing at the end of the 20th century, imagine what the number is like now. Data is growing at astonishing rates across all industries. According to IDC, the amount of information created and replicated in 2011 exceeded 1.8 zettabytes – yes, zettabytes – a nine-factor increase in just five years.
Each day, the world creates 2.5 quintillion bytes of data, according to IBM, meaning some 90 percent of the information alive today was only born within the last two years. Each sector in the U.S. economy is responsible for at least 200 terabytes of stored data, says a report from the McKinsey Global Institute.
This breathtaking amount of data being created, managed and stored – both structured and unstructured – is reality, and many organizations are racing to dissect it. The vendor community also is charging full speed at the new opportunity. According to Thomson Reuters data, venture capital firms poured $2.47 billion last year into Big Data technologies.
Perhaps no two verticals deal with security and Big Data more than the information-intensive industries of financial services and health care, says Sean Martin, founder of Imsmartin Consulting, who formerly held marketing roles at several security firms.
For instance, a recent panel at the O’Reilly Strata Conference examined how Big Data may help financial organizations proactively spot the next crisis. In addition, if new regulations are introduced as a result of prior events, data analysis may yield some fresh ideas of how to cope with them.
When it comes to health care, meanwhile, some, such as Craig Mundie, Microsoft’s chief research and strategy officer, believe Big Data can help reign in soaring costs related to patient treatment, Martin says. When data is shared openly – assuming HIPAA requirements are met – providers can better identify areas that are causing higher-than-desired costs, Mundie reportedly told attendees last fall at the Techonomy 2011 conference. It makes sense that models like this will be explored, Martin says, considering that a Centers for Medicare and Medicaid Services report predicts that health care costs will rise from $2.6 trillion to 4.6 trillion during this decade.
Making sense of it all
To understand how Big Data came to be, it might be wise to examine the evolution of Google’s flagship product, its search engine. Some may credit the web giant’s meteoric rise to dominance with its intuitiveness and clean interface, but what really made Google special was the superiority of its search algorithm to produce speed and relevance. Remember the early versions of AltaVista? Or Excite? They paled in comparison.
“It’s so damn fast and it’s so insightful that you take it for granted,” Jaquith says of Google. That had everything to do with Big Data, he says. Google developed a new way to do search by relying on non-relational databases and its home-grown MapReduce framework, which permitted the company to process queries against a massive number of distributed nodes. So instead of using conventional relational databases, Google was able to better scale and, in turn, instantaneously produce pertinent results.
“Big Data is just like the natural evolution of the fact that networks have gotten faster, bigger and servers can hold more things,” says John Kindervag, principal analyst at Forrester Research. “You just naturally want to put everything in it. If you have a big closet, by nature, you throw all your crap in the closet and sort through it when you want to…Once you have data, you can rule the world. Ask [Facebook founder] Mark Zuckerberg.”
Zions, in a way, is a microcosm of a Google or Facebook. Wood says that at the 30th largest bank in the United States, which counts nearly 11,000 people as employees and $50 billion in assets, applying a Big Data approach within his department is critical because security data “is different than the traditional data warehousing space.”
He says security assets are mostly unstructured and include things like firewall/anti-virus logs, packet captures, web log activity across internet banking and treasury management platforms, and login behavior on internal systems. But aggregating and analyzing that type of information wouldn’t fly in Zions’ traditional database management systems.
After it outgrew the SIEM technology, Wood says Zions needed to develop a more robust way to process data from its 130 different sources if it were ever going to draw any real, timely value. “Say you wanted to run a query across more than 30 days of data, you may be waiting hours for that to come back,” he says.
That meant, in 2005, building something called “multi-parallel computer processes,” which enabled the bank to leverage clusters of computers to aggregate and mine data. This enabled Zions to shed its reliance on security tools and start building its own internal models that could do the job as good – if not better – than paying huge sums of money to a technology provider.
Rather than continue looking for that latest security appliance to plug into his environment, he asked himself, “How can I leverage the data I already have to make a better business decision?”
William Ronca, executive VP of sales at Red Lambda, a security intelligence company based in Florida, agrees. He says many organizations deploy solution after solution, but none of them collaborate in any meaningful way.
One of those models Zions built out of the data it analyzed was to fight spear phishing abuse, in which certain people within a business, often executives, receive legitimate-looking emails that typically seek to install malware on their machine. It’s a well-known social engineering ploy that has led to some high-profile breaches in recent years, including one last year at security firm RSA.
“You’ve got an organization getting millions of emails a day,” Wood says. “An attacker targets a handful of people and sends five emails in. How do you detect and respond before your employee clicks on a link they shouldn’t?”
“None of these 15 or 17 or 20 tools are integrated together,” he says. “They’re doing separate jobs in the hope they’re securing the environment in some way.”
About two years ago, Zions needed even more scale, so it began leveraging an open-source product known as Apache Hadoop, an open-source tool inspired by Google’s MapReduce and File System frameworks. The bank contracted with a small vendor that helped it develop a customizable, enterprise-friendly version of the product.
“What Hadoop is is a piece of technology that you can distribute across tens of thousands to hundreds of thousands computers, and it splits all that data and then leverages your cluster for storage and computing power,” Wood explains. “Hadoop is our core security data warehouse. It’s our core Big Data repository.”
Zions is not alone. According to Ventana Research, which last summer polled IT managers, developers and data analysts across hundreds of companies covering multiple verticals, 54 percent are using or considering Hadoop for “large-scale data processing needs.” Big Data is also becoming more popular in the cloud – where it is well-suited considering the massive number of distributed machines necessary to generate actionable intelligence. Several major providers, as well as a number of talented start-ups, are offering Hadoop embodiments via the cloud.
Define, dissect and defend
So, as business leaders turn to Big Data to spawn what they hope will be lucrative business ideas, while in the process improving efficiency and agility, someone has to protect these data stores, which, analysts say, provide an attractive target for hackers – and potentially a single point of failure for organizations.
“As security professionals, we need to realize we’re eventually going to be asked to be the security custodians of this data,” Forrester’s Kindervag says.
According to a June 2011 report from IDC, titled “Extracting Value from Chaos,” the market analyst firm concluded that less than a third of all information in the “digital universe” contains at least “minimal” protection, while only half of all information that should be safeguarded actually is.
That might be a bitter pill to swallow for security professionals, who are well-versed in the sophistication and intentions of today’s cyber criminals, particularly well-funded nation-state adversaries who use low-and-slow techniques, known as advanced persistent threats (APT), to target coveted intellectual property, and then slowly and stealthily siphon out the booty without anyone noticing.
“If I’m a hacker of Anonymous, or part of an APT group, I’m really excited about the Big Data concept,” Kindervag says. “This is like Christmas to me. I don’t have to steal something from each individual store. I can steal the presents under the tree.”
Implementing proper access controls is important to safeguarding Big Data, he says. But encryption may be the real saving grace because it renders data unreadable. “It’s the only thing that’s going to protect us against these nation-state attacks,” he says. “We’re never going to keep ahead of those guys.”
But before deploying that sometimes difficult-to-manage technology, organizations must first define their data by discovering and classifying it. In other words, they need to decipher which are their most “toxic” assets. Then, they can dissect them.
“That’s the exciting stage,” he says. “My fear is they won’t do stage one and they’ll do stage two, and people will steal stuff and they won’t know it because the data hasn’t been classified, and people don’t know how valuable it is.”
That’s not a problem at Zions, Wood says, where the security team has become the corporation’s champion of Big Data.
“We treat this environment as any environment within our organization,” he says. “Whatever security policies and controls you have, your Big Data repository needs to be looked at in the same light. Every technology has got things that need to be considered about how you secure it. It’s like any new process or application.”
BIG DATA: The three Vs
Volume – Big Data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information.
Velocity – Oftentime, sensitive Big Data must be used as it is streaming in to the enterprise in order to maximize its value to the business.
Variety – Big Data extends beyond structured data, including unstructured data of all varieties: text, audio, video, click streams, log files and more.