In my previous Kubernetes blog I discussed the benefits of using BlueField SmartNICs to simplify provisioning of Kubernetes clusters in bare-metal infrastructures. A key takeaway from this blog was the current rapid shift toward bare metal Kubernetes, for delivering high-performance workloads across public, on-prem and edge environments. The topic is still very much trending as we see telco giants AT&T, Verizon and China Mobile, among others, investing in bare-metal cloud infrastructures to deliver enhanced digital experiences.
Going into KubeCon San Diego next week and following the introduction of the new ConnectX-6 Dx SmartNICs and BlueField-2 I/O Processing Units, this blog provides updates about our path to deliver high-throughput, low-latency Kubernetes network solutions at scale.
Accelerating Cloud-Native ML/AI Applications
Kubernetes plays an important role in the emergent ML/AI application ecosystem as new applications are built from the ground up as microservices. Mellanox ConnectX and BlueField SmartNICs offer in-hardware acceleration for Remote Direct Memory Access (RDMA/RoCE) communications, delivering best-in-class AI application performance and usability.
Partnering with the Kubernetes Network Plumbing working group, our team has successfully delivered an open-source, generic CNI and device plug-in for attaching SR-IOV network interfaces with RDMA support to a Kubernetes POD. What’s more, Red Hat has recently released its flagship OpenShift Container Platform 4.2 with inbox support for RDMA/RoCE communications over Mellanox ConnectX-4 Lx and ConnectX-5 SmartNIC cards. Red Hat OpenShift incorporates the community’s work in the space and delivers an enterprise experience for deploying ML/AI applications on bare-metal computing infrastructures, with enhanced performance and efficiency.
It was fascinating to witness Nvidia’s announcement of its revolutionary EGX Supercomputer platform at MWC Los Angeles, which puts Kubernetes right in the middle of the stack, to simplify how organizations deploy, manage and scale AI applications at the network’s edge. Nvidia also published a list of NGC-Ready (Nvidia GPU Cloud) for Edge systems, many of which support Mellanox SmartNICs with built-in RDMA/RoCE accelerators, including the HPE ProLiant DL380 Gen10, Dell PowerEdge R640, and more.
Today, with RDMA/RoCE integrated into the mainstream code of popular AI/ML frameworks, including TensorFlow, MXNet, Microsoft Cognitive Toolkit, and others, we expect to see more Kubernetes and OpenShift deployments that take advantage of the predictable and scalable performance that RDMA delivers.
Accelerating Cloud-Native Networking
A frequently asked question of DevOps engineers on their Kubernetes journey, is which networking model would work best for their cluster. Kubernetes does not provide a default networking model, but there is an ample amount of different stacks to choose from, including open-source and commercial options.
As a leading contributor to the Open vSwitch (OVS) community, Mellanox opted to integrate our advanced ASAP2 – Accelerated Switch and Packet Processing® technology with the popular OVN networking model for Kubernetes, which also utilizes OVS. Open Virtual Network (OVN) complements the existing capabilities of OVS with native virtual networking support and delivers production-quality implementation that can operate at scale. The choice of OVN also aligns with the Red Hat OpenShift roadmap, which recently introduced OVN support as a technical preview feature.
Mellanox ASAP2 advanced switching and packet processing technology is built into the Mellanox SmartNICs and delivers breakthrough cloud networking performance. At the heart of ASAP2 is the “eSwitch” – an embedded switch built into Mellanox SmartNICs. The beauty of the eSwitch lies in how it allows the SmartNICs to handle a large portion of the packet processing operations in the hardware, freeing up the host’s CPU, and providing high-throughput connectivity for virtual machines/containers. ASAP2 leverages the eSwitch to deliver the best of both worlds – the performance and efficiency of bare-metal server networking hardware, with the flexibility of software-defined networking (SDN).
The next step was to enable the OVN CNI to take advantage of the OVS hardware offload capabilities. The team is currently working to integrate the solution with the OVS connection tracking module, to allow the NIC hardware to offload NAT functions that are the basis to establish pod-to-pod communication in a Kubernetes cluster.
We look forward to introducing the complete solution in the Q1 timeframe, and bringing accelerated Kubernetes network connectivity coupled with the advanced SDN features.
Securing Cloud-Native Workloads
Kubernetes security is an immense challenge comprised of many highly interrelated parts. The shift from a monolithic model to today’s prominent microservices architecture has completely transformed the way enterprises ship applications at scale. At the same time, cloud-native applications generate intensive data movements between services and physical machines to satisfy a single application request. The amount of traffic and latency requirements often prohibit the use of zero-trust security solutions to avoid the risk of impacting application performance. Thus, this creates inherent challenges within the enterprise for both the DevOps team, whose task is to ensure high-quality application delivery, and the security team, whose primary goal is to protect customers’ data and privacy.
The Mellanox ConnectX-6 Dx now being shipped, and the recently-introduced BlueField-2 SmartNIC solutions provide hardware-accelerated, software-defined crypto capabilties that are ideally positioned to secure cloud-native workloads. The Kubernetes platform and its vibrant ecosystem of popular third-party components including microservices firewalls, service mesh platforms and ingress gateways widely use Transport Layer Security (TLS) for data-in-motion encryption between different system components. A notable example, natively featured in the platform, is to encrypt the Kubernetes API. A more advanced solution would be to introduce pod-to-pod communication encryption as featured with the leading service mesh platforms, Istio and Linkerd.
The team is currently researching into the various platforms and integration schemes to leverage the cutting-edge TLS encryption acceleration engines in our SmartNICs for securing cloud-native workloads at 100Gb/s, full wire speed!
We are excited to partner with the cloud-native ecosystem to bring to market advanced Kubernetes network solutions that scale beyond the public cloud to on-premise and edge data-centers.
If you’re attending KubeCon San Diego, we’d be happy to get together to learn about your goals and further the advancements of cloud native computing. Please drop me a note at email@example.com to set up a meeting.
- Check out this white paper to learn more about how Mellanox SmartNICs accelerate cloud-native ML/AI applications.
- Visit Mellanox.com for more information about Mellanox SmartNICs