In the first blog of this series, I argued that it is function and not form that define a DPU (Data Processing Unit) based SmartNIC . I also introduce another category of data center NICs called an intelligent NIC (iNIC) which include both hardware transport and a programmable data path for virtual switch acceleration. These capabilities are necessary but not sufficient for a NIC to be a SmartNIC. A true SmartNIC must also include an easily extensible, C-programmable Linux environment that enables data center architects to virtualize all resources in the cloud and make them appear as local. To understand why SmartNICs need this, we go back to what created the need for smarter NICs in the first place.
One of the most important reasons why the world needs DPU (Data Processing Unit) based SmartNICs is that modern workloads and data center designs impose too much networking overhead on the CPU cores. With faster networking (now up to 200Gb/s per link), the CPU just spends too much of its valuable cores classifying, tracking, and steering network traffic. These expensive CPU cores are designed for general purpose application processing, and the last thing needed is to consume all this processing power simply looking at and managing the movement of data. After all, application processing that analyzes data and produces results is where the real value creation occurs.
The introduction of compute virtualization makes this problem worse, as it creates more traffic on the server both internally–between VMs or containers—and externally to other servers or storage. Applications such as software-defined storage (SDS), hyperconverged infrastructure (HCI), and big data also increase the amount of east-west traffic between servers, whether virtual or physical, and often Remote Direct Memory Access (RDMA) is used to accelerate data transfers between servers.
Through traffic increases and the use of overlay networks such as VXLAN, NVGRE, or Geneve, increasingly popular for public and private clouds, adds further complications to the network by introducing layers of encapsulation. Software-defined networking (SDN) imposes additional packet steering and processing requirements and adds additional burden to the CPU with even more work, such as running the Open vSwitch (OVS).
Smarter NICs can handle all this virtualization (SRIOV, RDMA, overlay network traffic encapsulation, OVS offload) faster, more efficiently, and at lower cost than standard CPUs.
In addition, sometimes you want to isolate the networking from the CPU for security reasons. The network is the most likely vector for a hacker attack or malware intrusion and the first place you’d look to detect or stop a hack. It’s also the most likely place you’ll want to implement in-line encryption. The DPU based SmartNIC—being a NIC—is the first/easiest/best place to inspect network traffic, block attacks, and encrypt transmissions. This has both performance and security benefits, as it eliminates the frequent need to route all incoming and outgoing data back to the CPU and across the PCIe bus. It provides security isolation by running separately from the main CPU—if the main CPU is compromised then the DPU based SmartNIC can still detect or block malicious activity. And the smart NICs can work to detect or block attacks without immediately involving the CPU.
The security benefits of a DPU based SmartNIC are covered more in this Security blog by Bob Doud.
A newer use case for DPU based SmartNICs is to virtualize software-defined storage, hyperconverged infrastructure, and other cloud resources. Before the virtualization explosion, most servers just ran local storage, which is not always efficient but it’s easy to consume—every OS, application and hypervisor knows how to use local storage. Then came the rise of network storage—SAN and NAS and more recently NVMe over Fabrics (NVMe-oF). But not every application is natively SAN-aware, and some operating systems and hypervisors (like Windows and VMware) don’t speak NVMe-oF yet. Something smart NICs can do is virtualize networked storage—which is more efficient and easier to manage–to look like local storage—which is easier for applications to consume. A DPU based SmartNIC could even virtualize GPUs (or other neural network processors) so that any server can access as many GPUs as it needs whenever it needs them, over the network.
A similar advantage applies to software-defined storage and hyperconverged infrastructure, as both use a management layer (often running as a VM or as a part of the hypervisor itself) to virtualize and abstract the local storage—and the network—to make it available to other servers or clients across the cluster. This is wonderful for rapid deployments on commodity servers and is good at sharing storage resources, but the layer of management and virtualization soaks up many CPU cycles that should be running the applications. And like with standard servers, the faster the networking runs and the faster the storage devices are, the more CPU must be devoted to virtualizing these resources.
Once again enter the intelligent DPU based NIC (smarter NIC) or DPU based SmartNIC. The first offloads and helps virtualize the networking (accelerating private and public cloud, which is why they are sometimes called CloudNICs), and the second can offload both the networking and much or all the storage virtualization. SmartNICs can also offload a wide variety of functions for SDS and HCI, such as compression, encryption, deduplication, RAID, reporting, etc., all in the name of sending more expensive CPU cores back to what they do best—running applications.
Having covered the major DPU based SmartNIC use cases, we know we need them and where they can provide the greatest benefit. They must be able to accelerate and offload network traffic and also might need to virtualize storage resources, share GPUs over the network, support RDMA, and perform encryption. Now what are the top SmartNIC requirements? First all SmartNICs (and smarter NICs) must have hardware acceleration. Hardware acceleration offers the best performance and efficiency, which also means more offloading with less spending. The ability to have dedicated hardware for certain functions is key to the raison d’être of DPU based SmartNICs.
While for the best performance most of the acceleration functions must run in hardware, for the greatest flexibility, the control and programming of these functions needs to run in software.
There are many functions that could be programmed on a smart NIC, a few of which are outlined in the feature table of my previous blog. Usually the specific offload methods, encryption algorithms, and transport mechanisms don’t change much, but the routing rules, flow tables, encryption keys, and network addresses change all the time. We recognize the former functions as data plane and the latter functions as the control plane functions. The data plane rules and algorithms can be coded into silicon once they are standardized and established. The control plane rules and programming change too quickly to be hard coded into silicon but can be run on an FPGA (modified occasionally, but with difficulty) or in a C-programmable Linux environment (modify easily and often).
We have a choice on how much of a DPU based SmartNIC’s programming is done on the smart NIC. That is, the NIC’s handling of packets must be hardware-accelerated and programmable, but the control of that programming can live on the NIC or elsewhere. If it’s the former, we say the NIC has a programmable data plane (executing the packet processing rules) and control plane (setting up and managing the rules). In the latter case, the NIC only does the data plane while the control plane lives somewhere else, like the CPU.
For example with Open vSwitch, the packet switching can be done in software or hardware, and the control plane can run on the CPU or on the DPU based SmartNIC. With a regular foundational or “dumb” NIC, all the switching and control is done by software on the CPU. With a smarter NIC the switching is run on the NIC’s ASIC but the control is still on the CPU. With a true DPU based SmartNIC the switching is done by ASIC-type hardware on the NIC while the control plane also runs on the NIC in easily-programmable Arm cores.
Both transport offload and a programmable data path with hardware offload for virtual switching are vital functions to achieve application efficiency in the data center. According to the definition in my earlier blog Defining a DPU based SmartNIC, these functions are part of an intelligent NIC and are table stakes on the path to a DPU based SmartNIC. But just transport and programmable virtual switching offload by themselves don’t raise an intelligent NIC to the level of a SmartNIC or Genius NIC.
Very often we find customers that tell us they must have a DPU based SmartNIC because they need programmable virtual switching hardware acceleration. This is mainly because another vendor competitor with an expensive, barely programmable offering has told them a “SmartNIC” is the only way to achieve this. In this case we are happy to deliver this very same functionality with our ConnectX family of intelligent NICs which after all are very smart NICs.
But by my reckoning there are a few more things required to take a NIC to the exalted level of a DPU based SmartNIC, such as running the control-plane on the NIC and offering C-programmability with a Linux environment. In those cases, we’re proud to offer our BlueField DPU based programmable SmartNIC, which includes all the smarter NIC features of our ConnectX adapters plus from 4 to 16 64-bit Arm cores, all of course running Linux and easily programmable.
As you plan your next infrastructure build-out or refresh, remember my key points: