Organizations have become somewhat complacent in their thinking about risk. They often think of risk as concerning a narrow definition of security. However, the past two years have shown that risk management needs to consider a broad set of threats and issues. Organizations must not only think through how to deal with disruptions from hacks, but also from pandemics, civil unrest, natural disasters, and geopolitical issues.

In today’s digital world, customers expect organizations to operate continuously, no matter the current climate. If they’re offline, the downtime costs millions, if not billions, and their brands suffer damage.

As organizations continue to race to the cloud to help accelerate digital transformation projects, the risk management plans for their cloud infrastructure have to account for fluctuations in the availability of personnel, hardware (even if they are completely in the cloud), network connectivity, and cloud providers.

Here are a couple of risk management strategies organizations should consider implementing for their cloud infrastructure:

  • Use hot standbys for critical workloads.

So far this year, there have been a number of cloud providers that have had service interruptions. But it’s interesting to note that the biggest cloud outages have not been related to security breaches. Rather, mundane system issues like updates and script errors that manage these mega-platforms have been the culprits. In fact, this summer, a glitch in a software configuration update impacted many government agencies and major companies. Akamai confirmed that a cyberattack did not cause the issue.

When designing and deploying their cloud architectures, organizations need to ensure the high availability of their applications to protect against unplanned (and planned) downtime. Any application built for high availability requires geographic replication, often to a data center in a different region or to a different cloud provider. In the past, organizations achieved high availability for on-premises applications using a multiple data centers topology approach, with an active-active configuration where data is continuously replicated between two data centers.

But when they are managed as a whole, organizations must now think of failing over to another independent cloud provider using a hot standby configuration, where servers in another cloud are ready to process workloads with no delay when there is a failover.

Hot standbys in another cloud are expensive as the ingress and egress costs for moving data between clouds or between on-premises and cloud add up significantly. Organizations should still consider them for the most demanding applications. For example, in financial services, having a plan to failover or run across multiple clouds is becoming a regulatory mandate, particularly in the European Union. Clouds can fail, but critical applications at national and global levels must remain functional.

  • Scale up and out to meet changing app needs.

The ability to scale up to meet surges in demand for compute, network, and storage is another area organizations need to think about with risk management. Information architects are now moving workloads between cloud providers and geographies to deal with surges in customer activity or IoT information.

We must view elasticity two ways. First, think of it in terms of a local ability to expand and contract. Second, view elasticity as a hierarchy of scale-out—within a data center, cloud vendor, or geography (continent or country). By scaling out across all of these vectors, organizations now have contingencies to gain resources from multiple sources independent of failure.

Creating resilience to cyberattacks

Of course, security operates as a fundamental component of all risk mitigation strategies. The redundancy and availability outlined above offer important layers of resilience to cyberattacks. It’s less likely that bad actors will successfully attack every one of the sources of infrastructure outlined above at once.

Mandatory capabilities for operating in today’s distributed environments are encryption at rest and in flight. This also depends on key management and ease in enforcing and managing how users and applications connect into networks, databases, and applications. Being able to manage this across data centers was difficult. Today, with distributed systems spanning multiple clouds and on-premises, the complexity has been compounded. Luckily, various vendors and open source projects provide frameworks and tools to address this set of needs.

Organizations need to design their cloud architectures to take advantage of multiple independent sources of infrastructure components. Today’s programming models and standards enable this capability. They just have to seize the opportunity and take advantage of the available tools.

Lenley Hensarling, chief strategy officer, Aerospike