Achieving a Cloud Scale Architecture with DPU based SmartNICs

SmartNIC, , , , ,

Why You Need a DPU (Data Processing Unit) based SmartNIC and Smart NIC Use Cases

In the first blog of this series, I argued that it is function and not form that define a DPU (Data Processing Unit) based SmartNIC . I also introduce another category of data center NICs called an intelligent NIC (iNIC) which include both hardware transport and a programmable data path for virtual switch acceleration. These capabilities are necessary but not sufficient for a NIC to be a SmartNIC. A true SmartNIC must also include an easily extensible, C-programmable Linux environment that enables data center architects to virtualize all resources in the cloud and make them appear as local.  To understand why SmartNICs need this, we go back to what created the need for smarter NICs in the first place.


Why the World Needs DPU (Data Processing Unit) based SmartNICs

One of the most important reasons why the world needs DPU (Data Processing Unit) based SmartNICs is that modern workloads and data center designs impose too much networking overhead on the CPU cores. With faster networking (now up to 200Gb/s per link), the CPU just spends too much of its valuable cores classifying, tracking, and steering network traffic. These expensive CPU cores are designed for general purpose application processing, and the last thing needed is to consume all this processing power simply looking at and managing the movement of data. After all, application processing that analyzes data and produces results is where the real value creation occurs.

The introduction of compute virtualization makes this problem worse, as it creates more traffic on the server both internally–between VMs or containers—and externally to other servers or storage. Applications such as software-defined storage (SDS), hyperconverged infrastructure (HCI), and big data also increase the amount of east-west traffic between servers, whether virtual or physical, and often Remote Direct Memory Access (RDMA) is used to accelerate data transfers between servers.

Through traffic increases and the use of overlay networks such as VXLAN, NVGRE, or Geneve, increasingly popular for public and private clouds, adds further complications to the network by introducing layers of encapsulation. Software-defined networking (SDN) imposes additional packet steering and processing requirements and adds additional burden to the CPU with even more work, such as running the Open vSwitch (OVS).

Smarter NICs can handle all this virtualization (SRIOV, RDMA, overlay network traffic encapsulation, OVS offload) faster, more efficiently, and at lower cost than standard CPUs.


Another Reason for the DPU (Data Processing Unit) based SmartNIC—Security Isolation

In addition, sometimes you want to isolate the networking from the CPU for security reasons. The network is the most likely vector for a hacker attack or malware intrusion and the first place you’d look to detect or stop a hack. It’s also the most likely place you’ll want to implement in-line encryption. The DPU based SmartNIC—being a NIC—is the first/easiest/best place to inspect network traffic, block attacks, and encrypt transmissions. This has both performance and security benefits, as it eliminates the frequent need to route all incoming and outgoing data back to the CPU and across the PCIe bus. It provides security isolation by running separately from the main CPU—if the main CPU is compromised then the DPU based SmartNIC can still detect or block malicious activity. And the smart NICs can work to detect or block attacks without immediately involving the CPU.

The security benefits of a DPU based SmartNIC are covered more in this Security blog by Bob Doud.

The Mellanox BlueField Smart NIC offers extremely fast hardware acceleration of networking and storage functions plus C-programmable Arm cores

Virtualizing Storage and the Cloud

A newer use case for DPU based SmartNICs is to virtualize software-defined storage, hyperconverged infrastructure, and other cloud resources. Before the virtualization explosion, most servers just ran local storage, which is not always efficient but it’s easy to consume—every OS, application and hypervisor knows how to use local storage. Then came the rise of network storage—SAN and NAS and more recently NVMe over Fabrics (NVMe-oF). But not every application is natively SAN-aware, and some operating systems and hypervisors (like Windows and VMware) don’t speak NVMe-oF yet.  Something smart NICs can do is virtualize networked storage—which is more efficient and easier to manage–to look like local storage—which is easier for applications to consume. A DPU based SmartNIC could even virtualize GPUs (or other neural network processors) so that any server can access as many GPUs as it needs whenever it needs them, over the network.

A similar advantage applies to software-defined storage and hyperconverged infrastructure, as both use a management layer (often running as a VM or as a part of the hypervisor itself) to virtualize and abstract the local storage—and the network—to make it available to other servers or clients across the cluster. This is wonderful for rapid deployments on commodity servers and is good at sharing storage resources, but the layer of management and virtualization soaks up many CPU cycles that should be running the applications. And like with standard servers, the faster the networking runs and the faster the storage devices are, the more CPU must be devoted to virtualizing these resources.

Once again enter the intelligent DPU based NIC (smarter NIC) or DPU based SmartNIC. The first offloads and helps virtualize the networking (accelerating private and public cloud, which is why they are sometimes called CloudNICs), and the second can offload both the networking and much or all the storage virtualization. SmartNICs can also offload a wide variety of functions for SDS and HCI, such as compression, encryption, deduplication, RAID, reporting, etc., all in the name of sending more expensive CPU cores back to what they do best—running applications.

Mellanox ConnectX adapters are intelligent NICs and offer hardware offload popular networking and virtualization functions

Choosing the best DPU based SmartNIC—Must have Hardware Acceleration

Having covered the major DPU based SmartNIC use cases, we know we need them and where they can provide the greatest benefit. They must be able to accelerate and offload network traffic and also might need to virtualize storage resources, share GPUs over the network, support RDMA, and perform encryption. Now what are the top SmartNIC requirements? First all SmartNICs (and smarter NICs) must have hardware acceleration.  Hardware acceleration offers the best performance and efficiency, which also means more offloading with less spending. The ability to have dedicated hardware for certain functions is key to the raison d’être of DPU based SmartNICs.


Must be Programmable

While for the best performance most of the acceleration functions must run in hardware, for the greatest flexibility, the control and programming of these functions needs to run in software.

There are many functions that could be programmed on a smart NIC, a few of which are outlined in the feature table of my previous blog.  Usually the specific offload methods, encryption algorithms, and transport mechanisms don’t change much, but the routing rules, flow tables, encryption keys, and network addresses change all the time. We recognize the former functions as data plane and the latter functions as the control plane functions. The data plane rules and algorithms can be coded into silicon once they are standardized and established. The control plane rules and programming change too quickly to be hard coded into silicon but can be run on an FPGA (modified occasionally, but with difficulty) or in a C-programmable Linux environment (modify easily and often).

How much of the Programming Needs to Live on the DPU based SmartNIC?

We have a choice on how much of a DPU based SmartNIC’s programming is done on the smart NIC. That is, the NIC’s handling of packets must be hardware-accelerated and programmable, but the control of that programming can live on the NIC or elsewhere.  If it’s the former, we say the NIC has a programmable data plane (executing the packet processing rules) and control plane (setting up and managing the rules). In the latter case, the NIC only does the data plane while the control plane lives somewhere else, like the CPU.

For example with Open vSwitch, the packet switching can be done in software or hardware, and the control plane can run on the CPU or on the DPU based  SmartNIC. With a regular foundational or “dumb” NIC, all the switching and control is done by software on the CPU.  With a smarter NIC the switching is run on the NIC’s ASIC but the control is still on the CPU.  With a true DPU based SmartNIC the switching is done by ASIC-type hardware on the NIC while the control plane also runs on the NIC in easily-programmable Arm cores.

With Open vSwitch offload, the packet processing is offloaded to a ConnectX NIC eSwitch while the control plane runs on a CPU, or both the packet processing and control plane can run on a BlueField SmartNIC

ConnectX-5 NIC offloads OVS switching to NIC hardware


So Which DPU based SmartNIC is Best for Me, and Which Mellanox Adapters are SmartNICs?

Both transport offload and a programmable data path with hardware offload for virtual switching are vital functions to achieve application efficiency in the data center. According to the definition in my earlier blog Defining a DPU based SmartNIC, these functions are part of an intelligent NIC and are table stakes on the path to a DPU based SmartNIC. But just transport and programmable virtual switching offload by themselves don’t raise an intelligent NIC to the level of a SmartNIC or Genius NIC.

Very often we find customers that tell us they must have a DPU based SmartNIC because they need programmable virtual switching hardware acceleration. This is mainly because another vendor competitor with an expensive, barely programmable offering has told them a “SmartNIC” is the only way to achieve this. In this case we are happy to deliver this very same functionality with our ConnectX family of intelligent NICs which after all are very smart NICs.

But by my reckoning there are a few more things required to take a NIC to the exalted level of a DPU based SmartNIC, such as running the control-plane on the NIC and offering C-programmability with a Linux environment. In those cases, we’re proud to offer our BlueField DPU based programmable SmartNIC, which includes all the smarter NIC features of our ConnectX adapters plus from 4 to 16 64-bit Arm cores, all of course running Linux and easily programmable.

As you plan your next infrastructure build-out or refresh, remember my key points:

  • DPU based SmartNICs are increasingly useful for offloading networking functions and virtualizing resources like storage, networking, and GPUs
  • Intelligent NICs (iNICs or smarter NICs) accelerate data plane tasks in hardware but run the control plane in software
  • The control plane software—and other management software—can run on the regular CPU or on the very smart (Genius) NICs
  • Mellanox offers best-in class intelligent NICs (ConnectX), FPGA NICs (Innova), and fully-programmable data plan/control plane DPU based SmartNICs (BlueField DPU based programmable SmartNIC)


Additional Resources:

About Kevin Deierling

Kevin Deierling has served as Mellanox's VP of marketing since March 2013. Previously he served as VP of technology at Genia Technologies, chief architect at Silver Spring Networks and ran marketing and business development at Spans Logic. Kevin has contributed to multiple technology standards and has over 25 patents in areas including wireless communications, error correction, security, video compression, and DNA sequencing. He is a contributing author of a text on BiCmos design. Kevin holds a BA in Solid State Physics from UC Berkeley. Follow Kevin on Twitter: @TechseerKD

Comments are closed.