Big Data is arguably one of the killer apps to emerge over the past decade. The technology originated from a technique developed by Google called MapReduce, which uses parallel processing to generate analytics from massive amounts of data. An open source version of MapReduce, called Hadoop, has effectively “democratized” the availability of Big Data. With this easy-to-use platform, enterprises are finding new ways to solve problems and extract value from data.
However, Big Data analytics often involve access to data that should be protected, such as medical records, tax information and personally identifiable information (PII). Security and compliance professionals need to ensure Big Data deployments do not violate access control policies with respect to this information.
Within a Hadoop infrastructure there are several levels of authorization, including access to the Hadoop cluster, inter-cluster communications and access to the data sources. Many of these authorizations are based on Secure Shell (SSH) because the authentication protocol is considered secure and has good support for automated machine-to-machine (M2M) communication. The access control issues are straightforward:
First, who sets up the authorizations to run Big Data analytics? Next, we need to ask how are those authorizations and credentials managed and what happens when there are personnel changes? As well, we must determine whether authorizations are based on “need to know” security principles.
To protect sensitive information accessed by Big Data analytics, the following best practices are recommended:
- Discover: Take an inventory of the authorizations and identities within the Big Data environment.
- Monitor: Track the use of those identities. Find out which identities are not needed and can be removed.
- Manage: Establish centralized control over identity management in the Big Data environment.
Big Data has opened up new access to business-critical data. Organizations need to keep pace with resulting security concerns and bring Big Data under a sound identity and access management umbrella.