The internet continues to grow more interwoven, interdependent and fragile, with power increasingly concentrated in a handful of big cloud players. When a cyberattack brings down one of them, there is often extensive collateral damage to the thousands of enterprises relying on its service.
Case in point: in late October Amazon’s Route 53 DNS server was hit by a distributed denial of service (DDoS) attack (1). Amazon Route 53 connects user requests to infrastructure running in AWS, including Amazon S3 storage buckets, so this attack ultimately left many S3 customers across the United States unable to access their storage systems for approximately eight hours. This didn’t do much to raise enterprises’ confidence in the availability and security of their cloud-based data stores.
Cloud service providers are increasingly high-value targets for hackers—an opportunity to inflict the most harm with the least amount of work. As cloud adoption has increased dramatically, so too have attacks targeted at cloud service providers, with many cybersecurity experts predicting attacks on cloud networks to increase significantly this year (2).
Where does this leave you – the enterprise relying on the cloud for mission-critical services? What happens when your key cloud service goes down? Because these cloud service attacks are inevitable, you need to prepare for a rainy day. This involves prioritizing preparation and planning to ensure resilience. Some keys for doing this include:
Monitor the health of all cloud infrastructure you’re using – whether for DNS, storage, infrastructure as a service (IaaS), or something else. Because you can’t directly control this infrastructure, keeping a close eye on it is a must. In simple terms, this means pinging the cloud infrastructure at regular intervals around the clock, to ensure fast and reliable response.
There’s an important word of caution here: don’t monitor the cloud only from a cloud-based vantage point. While monitoring from the cloud is an easy, lower-cost approach, if your cloud-based service is in the same cloud you’re monitoring from, you’ll lose both the cloud-based service and your monitoring. This leaves you blind, with no idea your cloud-based service is down.
Monitor the end-user experience, from a wide variety of geographic and network vantage points. We suggest using backbone, ISP, wireless or other non-cloud vantage points as part of a monitoring mix focused on the end-user experience. Degrading end-user experiences (high latency, unavailability) across any one of a variety of network vantage points are often one of the first signs that something may be going wrong with a cloud-based service. That’s how we detected problems with the AWS DNS service several hours before Amazon reported it – by noticing persistent end-user glitches in San Francisco and intermittent issues in Boston, Chicago and Dallas. Monitoring end-user experiences alongside the actual cloud infrastructure provides the most comprehensive picture of cloud service health.
Beyond major outages, end-user performance monitoring is also a best practice for determining slight and/or brief performance degradations across certain geographies, which can quickly spread to others. It might also help you enforce service level agreements.
Have back-up and contingency plans in place. This may mean a multi-cloud strategy, where workloads can quickly and easily be shifted from one cloud service provider to another in the event of an outage. Such strategies can require some additional work and resources, but are often worth it. To its credit, Amazon had DNS backup plans in place for their incident, as packets were eventually re-routed through Neustar. That took over six hours though. If your organization had its own backups at the ready, your availability gaps would have been resolved much more quickly.
Another approach is having contingency plans to remove cloud-supported components from your site in the event of a problem. This might include popular cloud-based analytics services, like Google Analytics and Salesforce Analytics Cloud, which are incorporated into many modern websites. When a service like this experiences a problem, it can quickly create a domino effect, dragging down performance for the sites it supports. Enterprises should have contingency plans in place to either remove such services immediately, and/or replace them with a backup.
In summary, cyberattacks that bring down major cloud services are no longer a question of if, but when. In most cases, using the cloud will ensure a higher level of security than most organizations can achieve on their own. However, any enterprise relying on the cloud must take proactive steps to mitigate the impact of increasingly common attacks, thus helping contain the nasty byproduct of unplanned downtime.