Starting out my career as an IT engineer in the early 2000’s exposed me to early discussions about hypervisors and virtual machines, including how they can save you time on server provisioning. I was intrigued by the way server virtualization had disrupted enterprise IT over the years, delivering infrastructure efficiency and automation. By the time I moved on in my career into a business role in 2009, most workloads were running on highly distributed virtual environments, with just a handful of powerful bare-metals running high-speed SQL databases for performance-sensitive workloads.
Today, the Kubernetes container orchestration platform is the de-facto driving force for agile delivery of cloud-native applications. Throughout the emergence and development of Kubernetes, most of its deployments have used virtual machines as the underlying infrastructure platform, hosted either on public clouds or in on-premise datacenters.
Lately, we see the growing trend of building new Kubernetes clusters from the ground up on bare-metal server infrastructures, eliminating the need to deploy hypervisors to abstract the physical hardware. We can largely attribute this shift to several key trends in the cloud-native ecosystem, including the rising demand for high-performance workloads such as big data analytics, machine learning and artificial intelligence. These are driving system architects and cloud operators to take the hypervisor out of the equation and achieve better application performance straight on metal. This demand has also been fueled by recent Kubernetes framework enhancements including GPU-powered node enablement, and CPU and memory resource management, which are collectively geared toward delivering superior performance and scale. Another reason enterprises and service providers undergoing digital transformation are embracing bare-metal Kubernetes lies in the push toward deploying workloads at the network’s edge. To unleash the full potential of edge computing, the underlying infrastructure must be optimized for performance, ultra-low latency and resiliency. Bare-metal servers that provide direct hardware access, coupled with a leading-edge computing software stack, typically outperform hypervisor-based platforms at the edge.
While bare-metal Kubernetes clusters deliver on the promise of performance, they also reveal a myriad of challenges around security, data storage and operations. In this blog I will introduce the advanced Mellanox BlueField™ SmartNIC and how it empowers bare-metal Kubernetes clusters.
Mellanox BlueField SmartNIC is the world’s leading, fully-programmable network adapter. Integrating the best-in-class Mellanox ConnectX® network adapter with a set of Arm processors makes BlueField SmartNIC capable of delivering powerful functionality for cloud data-centers, high-performance networking and storage applications. Also, the combination of programmable hardware acceleration engines with general-purpose software and advanced network capabilities turns BlueField into the ideal platform for bare-metal provisioning, storage virtualization, and more.
BlueField provides built-in functional isolation between the host CPU and BlueField’s Arm-based system, protecting each individual workload while providing flexible control and visibility at the server level, reducing risk and increasing efficiency.
While enterprises opt to deploy bare-metal Kubernetes to obtain direct access to the underlying hardware, they also need to install suitable device drivers to utilize it. Traditionally, customers don’t like to install drivers on their systems, and for good reason—installing drivers adds much overhead to bare-metal provisioning and software management as it requires customizing images to include the needed drivers. This overhead is dramatically reduced in hypervisor-based environments, primarily because hypervisors abstract the hardware, so it is unnecessary to install drivers in guest virtual machines.
To address those challenges, Mellanox BlueField SmartNIC emulates a VirtIO network interface to the bare-metal host operating-system. As a standard Linux network driver, VirtIO enables network connectivity without having to deploy device drivers. BlueField’s hardware-accelerated VirtIO emulation capability provides great performance, infrastructure efficiency and operational agility.
Bare-metal cloud environments usually install storage media on every host to deliver the best application performance. This comes at a price for the cloud operator by limiting their ability to efficiently provision remote storage, which is easier to migrate and protect. Therein lies a conflict when designing a bare metal environment, between what is best for the application (local storage) and what is best and most easily composable for the cloud operator (networked storage). By leveraging Mellanox BlueField NVMe SNAP technology, cloud operators can now virtualize the bare metal Kubernetes storage with zero impact on the applications, in effect, creating a win-win situation for both. Bare-metal hosts continue to use their standard operating system’s NVMe PCIe driver, with little to no performance degradation, while the service provider is gaining a richer offering with greater efficiency – storage is now virtualized, thin-provisioned, backed up, and can be migrated between servers, providing savings in terms of both CAPEX and OPEX.
Virtualized environments have evolved over the years to offer a range of integrated security services that are built on the foundation of a unified and distributed software control plane for compute and networking. A notable example of such security service is micro-segmentation, which lets you enforce policies on the connectivity between workloads and application domains across the data-center. But deploying bare-metals for your Kubernetes cluster means you can no longer implement hypervisor-based micro-segmentation. There are ample security vendors offering competing agent-based solutions that can be deployed on bare-metal server infrastructures. The challenge here is two-fold, deploying security agents at an environment that was optimized for performance and DevOps automation is often not desirable. In some cases, deploying agents in certain workloads impacts regulatory or compliance and thus not permitted. Mellanox BlueField SmartNIC is perfectly positioned to enable agentless and high-performance security in bare-metal metal Kubernetes environments.
Due to its unique form factor and features, BlueField SmartNIC acts as a “computer-in-front-of-a-computer,” enabling applications to run on its CPU, fully isolated from the host’s CPU and operating-system. This isolation enables software agents to run on the SmartNIC when they cannot run on the host system, making BlueField work best for a range of cyber security solutions, including resilient micro-segmentation, stateful next-generation firewall, cloud-scale anti-DDoS, and more. Separating the security controls from the host, BlueField’s isolation capability also ensures that in the event a host has been compromised, the attack won’t spread further throughout the data-center.
Deploying BlueField SmartNICs in the datacenter, and specifically bare-metal environments, gives security teams enhanced visibility across cloud domains and enforces a consistent security policy in the enterprise, while offering unmatched performance.
Kubernetes plays an important role in the emergent AI application ecosystem as new applications are built from the ground up as microservices. A key to what makes Kubernetes work best for AI is that it abstracts infrastructure management, enabling data scientists and software developers to focus their time and efforts on building effective AI-driven applications instead of on managing the infrastructure.
Mellanox BlueField SmartNIC offers in-hardware acceleration for Remote Direct Memory Access (RDMA/RoCE) communications, delivering best-in-class performance and usability. RDMA is a network technology that allows for direct memory access from one computer into that of another, without involving either’s operating-system and CPU. RDMA is especially useful in scenarios involving massively parallel computer clusters as it permits high-throughput, low-latency networking. Once an application performs an RDMA Read or Write request, the system delivers the application data directly to the network (zero-copy, fully offloaded by the network adapter), reducing latency and enabling fast message transfer. RDMA over Converged Ethernet (RoCE) is a network protocol that allows RDMA to run over an Ethernet network.
RDMA/RoCE are integrated today into the mainstream code of popular ML/AI frameworks, including TensorFlow, Microsoft Cognitive Toolkit, and others. Having RDMA/RoCE native support in AI frameworks enables applications using those same frameworks to take advantage of the predictable and scalable performance that RDMA delivers.
Mellanox has been working in the Linux and Kubernetes communities on a standardized solution to enable RDMA and RoCE transport technologies for containerized applications. The solution enables enterprises to run AI applications based on the various ML/AI frameworks on bare-metal Kubernetes, with Mellanox BlueField SmartNICs providing accelerated network performance.
As Kubernetes continues its path into mainstream commercial solutions in 5G wireless networks, autonomous vehicles, industrial IoT and more, enterprises and service providers will turn to bare-metal clouds to achieve higher ROI and lower TCO on their infrastructures. Mellanox BlueField SmartNIC is uniquely positioned to transform bare-metals with the unmatched performance, security and operational agility to unleash to full potential of bare-metal infrastructures.
To learn more about Mellanox BlueField SmartNICs for bare-metal clouds and Kubernetes, Watch this video: https://youtu.be/lQAN9SRviDQ check out this solution brief, or visit www.mellanox.com/products/smartnic
Visit Mellanox at KubeCon + CloudNativeCon Barcelona, Booth S33, where we will be showcasing our award-winning end-to-end Ethernet portfolio including intelligent and smart adapters, switches and cables.