Telecommunication service providers are betting big on technologies such as cloud and Network Function Virtualization (NFV) to revamp their infrastructure. They are putting their cards on the table so that they can be more agile in new service creation, more elastic in on-demand service scaling, and achieve better economics in terms of infrastructure build-out. No wonder some of the leading service providers are also the strongest supporters and developers of NFV technologies. These include China Mobile, China Telecom, NTT, AT&T, Deutsche Telekom, Verizon, Vodafone…the list goes on and on.
One of the key trends of telco cloud and NFV is a shift away from running network functions on specialized appliances (such as a box that is built to run a Firewall and only a Firewall) to running on high-volume, general-purpose server, storage and switching platforms, sometimes called COTS (Commercial Off-the-Shelf) devices. It is natural to conclude that designing the infrastructure with the right COTS hardware to best support the NFV applications holds the key to overall NFV performance and efficiency, as well as determining ultimately how soon NFV can move beyond the proof of concept phase into real-world large-scale deployments.
There are some unique requirements on compute and networking infrastructure resulting from NFV applications. These include the Virtualized Network Functions (VNFs). In this blog, we will focus on the most fundamental need: packet performance, because intuitively, network functions, no matter whether physical or virtualized, process network traffic. VNFs look at packets, understand them and take action based upon pre-configured rules and policies. Packet performance is normally measured in millions of packets per second (Mpps). Note that this is different from throughput, which measures how fast I/O system can move bit streams. Throughput is normally measured in Giga-bits per second (Gbps). For the same throughput, packet performance can vary significantly. For example, with a throughput of 10Gbps, the theoretical maximum packet performance can be 14.88 Mpps for 64-byte packet streams or 0.82 Mpps for 1500-byte packet streams. Small packet size puts much more pressure on the I/O system’s capability to be able to push line rate throughput.
In carrier-grade networks, there is a substantial amount of control traffic that consists of mainly small packets. In addition, packets for real-time voice traffic are also just around 100 bytes long. In order to ensure that an Ethernet network is capable of supporting a variety of services (such as VoIP, video, etc.), and meeting the high availability requirements, IETF-defined RFC2544 is the key. This uses seven pre-defined frame sizes (64, 128, 256, 512, 1024, 1280 and 1518 bytes) to simulate various traffic conditions, and perform comprehensive Ethernet testing at service turn-up. RFC2544 and small packet performance are deemed vital to ensuring service quality and increasing customer satisfaction. Unfortunately, there are some switches that are unable to pass RFC 2544, however a well architected switch can switch packets of any size with, wait for it, zero packet loss.
But with NFV the goal is to swap out systems that have passed RFC2544 using servers built with general-purpose CPUs. These general-purpose CPUs and the operating systems running on the CPU were not designed to perform high-speed packet processing. If we look at the history of how network devices such as routers are built, the CPU has always been a key component. Historically speaking, routers started with software running on general processors. The AGS (Advanced Gateway Server), shipped in 1986 as Cisco’s first commercial multiprotocol router, was based on a Motorola 68000 series CPU (M68k). Due to the speed limitations of this processor and other factors, packet switching performance was limited to the range of 7,000 to 14,000 packets per second (pps) so even with fast switching—pretty anemic by today’s standards. Fast forward three decades and no reputable router vendor is using a CPU to perform the packet data path forwarding. Instead, all vendors use either custom ASIC, commercial silicon, or network processors which are much better equipped to support tens or hundreds of gigabits of throughput that a router line card needs to push. CPUs on routers only play significant roles on the route processors which are running the control and management planes.
To enhance the CPU’s capability to process packets, Intel and other contributing companies created Data Plane Development Kit (DPDK), a set of data plane libraries and network interface controller drivers aimed to perform fast packet processing. Aside from optimizing buffer management and other enhancements, DPDK changed the packet receive operation from push mode to poll mode, eliminating a number of interrupts, context switches and buffer copies in the Linux network stack, to achieve several fold improvement in packet performance. But the downside is also easy to see; IT professionals who deploy DPDK need to dedicate a significant number of CPU cores just for the packet processing. These expensive CPU cores will spin in loops, running at GHz rates and basically doing nothing, all while simply waiting for packets to arrive.
As a leader in high-performance server interconnect, Mellanox NICs include Poll Mode Driver (PMD) support for DPDK and we set the record of >90 Mpps of DPDK performance on our ConnectX®-4 100G NIC. But we have an alternative way, and better way, to solve the NFV packet performance challenge. We call this Accelerated Switching and Packet Processing (ASAP2). This solution combines the performance and efficiency of server/storage networking hardware, the NIC (Network Interface Card), along with the flexibility of virtual switching software to deliver software-defined networks resulting in the highest total infrastructure efficiency, deployment flexibility and operational simplicity.
Starting from Mellanox ConnectX-3 Pro series of NICs, Mellanox has designed an embedded switch (eSwitch) in its NIC silicon. This eSwitch is a flow-based switching engine capable of performing Layer-2 (L2) switching for the different VMs running on the server at higher performance levels and better security and isolation. eSwitch capabilities have been enhanced as we move to the current generation of ConnectX-4 NICs to perform packet classification and overlay virtual network tunneling protocol processing, specifically VXLAN encapsulation and de-capsulation. And in the latest ConnectX-5 NIC, the eSwitch can handle Layer-3 (L3) operations such as header rewrite, MPLS operations such as label push/pop, and even Flexible customer-defined parsing and header rewrite of vendor specific headers. The ASAP2 accelerator is built on top of eSwitch NIC hardware, and allows either the entire virtual switch, or significant portions of virtual switch or distributed virtual router (DVR) operations to be offloaded to the Mellanox NIC ̶ ̶̶ all of which achieve greatly improved packet performance with significantly reduced CPU overhead. At the same time, ASAP2 keeps the SDN control plane intact, and the SDN controller still communicates with the virtual switches to pass down the network control information. This is exactly how modern networking devices are built, a software control and management plane and a hardware data plane.
ASAP2 easily beats DPDK in 3 critical ways:
Everyone intuitively understands that hardware performs better than software, but how much better? We have compared packet performance in ASAP2 scenario where the data path of Open vSwitch (OVS) is offloaded to the Mellanox Connect-X 4 Lx NICs, and in DPDK-accelerated OVS scenario where OVS is fully run in user space using DPDK libraries to bypass the hypervisor kernel to boost packet performance.
There is a stark contrast in packet performance: with fewer number of flows, a 64-byte packet stream, ASAP2 reached 33 Mpps on a 25G interface where the theoretical maximum packet performance is 37.2 Mpps. For DPDK-accelerated OVS, the best performance is 8-9 Mpps, less than 30 percent of what ASAP2 can deliver.
When we scale up the number of flows, the contrast is even more apparent: ASAP2 can deliver ~10X of the DPDK-accelerated OVS packet performance. With 60,000 flows, and VXLAN encap/decap handled in the ConnectX-4 Lx eSwitch, ASAP2 achieves 18 Mpps. OVS over DPDK achieves 1.9 Mpps in the same amount of flows. It is worth mentioning that the OVS over DPDK configuration used standard VLAN as configuring VXLAN for OVS over DPDK and this was not trivial. We expect that with VXLAN OVS over DPDK performance will be even lower.
In all the above scenarios, DPDK-accelerated OVS consumes four CPU cores, while ASAP2 consumes none, zero CPU cores, while delivering significantly higher performance with a software-defined OVS control plane. The CPU overhead of DPDK will be unbearable as we move to 25G, 50G or 100G server connectivity. Would you want to dedicate 10 or 15 CPU cores for packet processing? If so, you have no CPU cores left to do anything else. For cloud service providers, this is directly affecting their top line because for them, the more VMs or containers they can run on their servers, the more money they can make. If they allocate significant number of CPU cores for DPDK, then they spin up fewer number of VMs on their servers and make less money!
In his Netdev Conference keynote presentation Fast Programmable Networks & Encapsulated Protocols, David Miller, primary maintainer of the Linux networking stack, made his audience repeat three times, “DPDK is NOT Linux”. He wanted to make sure that people understand that you need a separate BSD license to run DPDK. If you are using a commercial DPDK-accelerated virtual switch solution, there is additional cost associated with that.
But there is absolutely no additional charge associated with ASAP2. ConnectX-4 Lx is a high-volume NIC that we are shipping in significant quantity to our hyperscale customers and it is priced very competitively compared to other high-volume NICs. All changes related to OVS Offload (using ASAP2 to offload OVS data plane to NIC hardware) are up-streamed to open source communities, including OVS and Linux. OVS Offload is transparent to applications and there is no change needed at the application level to take advantage of ASAP2. Mellanox does not charge a cent for the ASAP2 accelerator and it comes standard on the latest generations of our ConnectX NICs.
Getting Mellanox ConnectX-4 Lx is easy, it is carried by multiple of our server OEM partners including Lenovo, Dell and HP. So try ASAP2 ASAP, and enhance your NFV performance and efficiency today!