The fast-paced Kubernetes ecosystem has given rise to multiple tools and concepts that have transformed how organizations deploy and operate applications in cloud environments. One tool that has garnered particular interest is the service mesh: an infrastructure layer that controls and manages the network communication between the micro-services that comprise a Kubernetes application.
Exemplified by projects such as Linkerd and Istio, the service mesh originally found its adopters in the domain of platform owners and site reliability engineers (SREs). These engineers, tasked with building the internal infrastructure platform for their organization, used the service mesh to build reliability features into their platform, including load balancing, automated retries, blue-green deploys, and observability.
More recently, the service mesh has found another interested audience—security and compliance teams. Over the past year, CISOs and their teams have increasingly turned to the service mesh to improve their security posture. It’s become especially true in the cloud, where sophisticated attackers, complex threat models, and the lack of control and ownership over the low-level networking infrastructure combine to form a perfect storm of exposure and risk.
CISOs take notice
It's important to know about how the service mesh works to understand why CISOs are now so interested. The most successful service meshes use a sidecar proxy—a network proxy inserted into each Kubernetes pod (the most granular unit of an application). That proxy transparently handles all incoming and outgoing traffic to that pod. By transparently measuring and modifying the traffic traversing the pod, this proxy can implement a variety of features without requiring changes to the underlying application. And since it's the single point of ingress and egress for all traffic to that pod, it's in an ideal place to implement network security controls for that pod.
Of course, deploying hundreds or thousands of network proxies within a single cluster only makes sense if those proxies are lightweight and transparent to the application. The most advanced service meshes focus on minimizing the computations and operational footprint of each proxy. Linkerd makes use of ultralight proxies written in Rust for performance and security, rather than the more popular, but more complex C++-based Envoy proxy.
Regardless of implementation details, for platform owners focused on reliability, the service mesh's ability to deliver features like latency-aware load balancing, automated retries, and "golden signal" metrics has become compelling. But the same sidecar proxy model also makes the service mesh valuable for security and compliance owners. And like the platform owner, the service mesh lets security owners implement critical security features without requiring developers to do any work.
Let's start with the challenge that CISOs face today: how do they ensure security and compliance in cloud environments where they don't own the wires, don't own the machines, and have no control over what ultimately transpires?
The security guarantees that we have lost from our hardware must now get established by software. To tackle this, a crop of technologies, concepts, and patterns such as mutual TLS, workload identity, and authorization policy (microsegmentation) have risen to the forefront.
At their heart, these are all software techniques that deliver security on top of insecure foundations. For the same reasons it was a compelling place for reliability features, the service mesh also has become a fantastic place to implement these security features. For example, one big driver of service mesh adoption is mutual TLS—a variant of TLS that ensures that every time two pods communicate within or across clusters they establish a secure channel that’s not just encrypted and protected from manipulation but authorized based on identity that's intrinsic to the pod on each side. Long gone are the days of relying on IP addresses as providing any kind of identity, or establishing plaintext TCP connections and considering the job done.
Similarly, authorization policy means that every pod can explicitly authorize the connections and traffic it receives, based not just on the workload identity of the client, but also on the (encrypted, unmodifiable) route, path, or method it's requesting!
For practitioners of the often-nebulous world of zero-trust, the sidecar model used by the service mesh actually offers a concrete solution: the proxy in each pod acts as the enforcement point, controlling all network access to the application components within. This sits directly in line with the zero-trust directive of "enforce everywhere, every time."
Finally, for the practical considerations of service mesh adoption within an organization, the model lets us have identity and policy layers captured outside of the application code, where they can be owned, monitored, and controlled by security or platform teams. As we've seen with the countless security-conscious organizations adopting service mesh today, the separation has become crucial to the success of the service mesh within an organization, as it lets security owners retain control over policies and posture without incurring a new dependency on developers or network engineering teams.
Like Kubernetes itself, the world of cloud security functions as a rapidly shifting ecosystem that can often feel like sailing in stormy waters. For adopters of the service mesh, at least they may find a lighthouse on the horizon.
William Morgan, co-founder and CEO, Buoyant