Amazon cloud failure highlights customer responsibility

Amazon, in a lengthy letter posted to its website on Friday, has apologized for a cloud computing outage that left many popular sites unavailable.

But the incident likely will prompt some to re-evaluate their use of cloud services, according to experts.

Amazon said a network configuration change caused the outage, which began on April 21 and lasted for several days. The incident occurred when a configuration change to shift traffic, intended to upgrade the capacity of the network, was executed incorrectly.

For security practitioners, the incident should serve as a reminder that like any technology, the cloud can fail, and they must be prepared for that, Philip Cox, director of security and compliance at consultancy SystemExperts, told SCMagazineUS.com on Friday.

With its promises of ubiquitous data and unlimited storage capabilities, many organizations rush to take advantage of cloud services without considering the risks, he said. The Amazon outage underscores the need to have disaster recovery and business continuity plans in place.

“The cloud is made of technology,” Cox said. “You should have good processes and practices around the technology you use.”

While some are using the outage as a reason to assess whether their disaster recovery plans are up to par, others are reconsidering their use of cloud computing altogether, Cox said.

“Business are evaluating -- which they should have done in the first place -- whether cloud computing is something right for their needs,” he said. “Hopefully they do a proper business analysis.”

Amazon, meanwhile, has promised to audit its change process and automate the network switching process that caused the issue to prevent a similar incident from happening again.

“We want to apologize,” Amazon said in its letter. “We know how critical our services are to our customers' businesses and we will do everything we can to learn from this event and use it to drive improvement across our services.”

The outage affected a number of well known websites and web services, including Foursquare, Quora, HootSuite and Reddit.

Ironically, one of the websites knocked offline in the outage was Cloutage.org, a new project whose mission is to document security incidents involving cloud services.

Jake Kouns, president of the Open Security Foundation, which runs the project, told SCMagazineUS.com on Friday that the Cloutage site, hosted using the Amazon Elastic Compute Cloud (EC2), was unavailable for several days.

Kouns said the organization is still investigating the incident, but some files appear to have been overwritten as a result of the outage. The site is again operational, but he criticized Amazon for failing to provide any information while the outage occurred.

Amazon has offered affected customers a 10-day service credit. In addition, the company has promised to improve the way it communicates with customers and provide more frequent updates.

Kouns also said the incident is a reminder that cloud services are not fail-proof.

“The same risks apply to the cloud that are in your exiting environment, and you need to make sure those [security] controls are in place,” he said. “While it feels good to trust the cloud, we continue to see these expert cloud providers have the same issues as traditional organizations. Continue the same risk-based approach to putting the proper security controls in place.”