Software Defined Networking, Done Right Part 2

Cloud Computing, Cloud Networking, hardware offload, Network Function Virtualization (NFV), Openflow, Software Defined Networking, VTEP, VXLAN, , , , , , , , , , ,

Read Software Defined Networking, Done Right Part 1 and Part 3.


Part 2: Innovations that can Move SDN from Depths of Disillusionment to Peak of Enlightenment

In Part I of this SDN blog series, The -State-of-the-SDN Art, I reviewed the mainstream SDN deployments models and discussed the pros and cons of each model. In Part II of this blog series, I will discuss some of the key Mellanox innovations that can enhance the performance and efficiency of SDN deployments, especially large-scale deployments. As Andrew Lerner blogged about the 2016 Gartner Networking Hype Cycle, SDN is firmly entrenched in the trough of disillusionment now. He especially mentioned that:

During 2015, we started to see production adoption of SDN solutions, though broad deployments are still relatively rare. A variety of both established vendors and startups continue to develop SDN technologies, but full, robust, end-to-end enterprise-ready solutions are not yet fully established on the market.”

In some sense, these technologies we discuss below are key to moving SDN through the hype cycle from depths of disillusionment to peak of enlightenment.

VXLAN Offload on ConnectX® NICs

In the good old days, when network virtualization was realized through VLAN, achieving line rate performance on the server host is possible because the server could offload some of the CPU-intensive packet processing operations such as checksum, Receive Side Steering (RSS), Large Receive Offload (LRO) etc., into the NIC hardware. This both improved network I/O performance and reduced CPU overhead, ultimately making the infrastructure run more efficiently.

Now, with overlay SDN, a tunneling protocol such as VXLAN, NVGRE or GENEVE is introduced to encapsulate the original payload. For NICs that don’t recognize these new packet header formats, even the most basic offloads stop functioning, resulting in all packet manipulating operations to be done in software in CPU. This can cause significant network I/O performance degradation and excessive CPU overhead, especially when server I/O speed evolves from 10Gb/s to 25, 40, 50, or even 100Gb/s.

Starting from the ConnectX®-3 Pro series of NIC, Mellanox supports VXLAN hardware offload that includes stateless offloads such as checksum, RSS, and LRO for VXLAN/NVGRE/GENEVE packets. With VXLAN offload, I/O performance and CPU overhead can be restored to similar levels as VLAN.

The following two graphs show the bandwidth and CPU overhead comparison in three scenarios: VLAN, VXLAN without offload, and VXLAN with offload. VXLAN offload results in greater than 2X throughput improvement with approximately 50 percent lower CPU overhead.


VXLAN Offload is supported at OS/hypervisor kernel level for Linux, Microsoft Hyper-V, and VMWare ESXi, and does not depend on the type of virtual switch or router used.

ASAP2 (Accelerated Switching and Packet Processing) on ConnectX-4 NICs

Starting from ConnectX®-4 series of NICs, Mellanox support VTEP capability in server NIC hardware through the ASAP2 feature. With a pipeline-based programmable eSwitch built into the NIC, ConnectX-4 can handle a large portion of the packet processing operations in hardware. These operations include VXLAN encapsulation/decapsulation, packet classification based on a set of common L2 – L4 header fields, QoS and Access Control List (ACL). Built on top of these enhanced NIC hardware capabilities, ASAP2 feature provides a programmable, high-performance and highly efficient hardware forwarding plane that can work seamlessly with SDN control plane. It overcomes the performance degradation issues associated with software VTEP, as well as complexity issues of coordinating between server and TOR devices in case of hardware VTEP.

There are two main ASAP2 deployment models: ASAP2 Direct and ASAP2 Flex.


ASAP2 Direct

In this deployment model, VMs establish direct access to Mellanox ConnectX-4 NIC hardware through SR-IOV Virtual Function (VF) to achieve the highest network I/O performance in virtualized environment.

One of the issues associated with legacy SR-IOV implementation is that it bypasses the hypervisor and virtual switch completely and the virtual switch is not aware of the existence of VMs in SR-IOV mode. As a result, the SDN control plane cannot influence the forwarding plane for those VMs using SR-IOV on the server host.

ASAP2 Direct overcomes this issue through enabling rules offload between the virtual switch and the ConnectX-4 eSwitch forwarding plane. In this case, we use Open Virtual Switch (OVS), one of the most commonly used virtual switches, as an example. The combination of SDN control plane through OVS who communicates with a corresponding SDN controller, and NIC hardware forwarding plane offers the best of both worlds: software-defined flexible network programmability and high network I/O performance for the state-of-art speeds from 10G to 25/40/50/100G. By letting the NIC hardware taking the I/O processing burden from the CPU, the CPU resources can be dedicated to application processing, resulting in higher system efficiency.

ASAP2 Direct offers excellent small packet performance beyond the raw bit throughput. Mellanox’s benchmark shows that on a server with 25G interface, ASAP2 Direct achieves 33 million packets per second (MPPS) with ZERO CPU cores consumed for a single flow, and about 25 MPPS with 15000 flows performing VXLAN encap/decap in ConnectX-4 Lx eSwitch.

ASAP2 Flex

In this deployment model, VMs run in para-virtualized mode and still go through the virtual switch for its network I/O needs. However, through a set of open APIs such as Linux Traffic Control (TC), or Data Path Development Kit (DPDK), the virtual switch can offload some of the CPU intensive packet processing operations to the Mellanox ConnectX-4 NIC hardware, including VXLAN encapsulation/decapsulation and packet classification. This is a roadmap feature and the availability date will be announced in the future.

OpenFlow support on Spectrum Switches

Spectrum is Mellanox’s 10/25/40/50 and 100Gb/s Ethernet switch solution that is optimized for SDN to enable flexible and efficient data center fabrics with leading port density, low latency, zero packet loss, and non-blocking traffic.

From the ground up, at the switch silicon level, Spectrum is designed to have a very flexible processing pipeline so that it can accommodate programmable OpenFlow pipeline that allows packets to be sent to subsequent tables for further processing and allows metadata information to be communicated between OpenFlow tables. This makes Spectrum ideal choice for supporting the OpenFlow 1.3 specification.

In addition, Spectrum is an OpenFlow-hybrid switch that supports both OpenFlow operation and normal Ethernet switching operation. Users can configure OpenFlow at port level, assigning some Spectrum ports to perform OpenFlow based packet processing operations and others to perform normal Ethernet switching operations. In addition, Spectrum also provides a classification mechanism to direct traffic within one switch port to either the OpenFlow pipeline or the normal Ethernet processing pipeline. 

VTEP support in Spectrum Switches

Mellanox Spectrum supports VTEP gateway functionalities that make it ideal to be deployed as:

  • Layer 2 VTEP gateway between virtualized networks using VXLAN and non-virtualized networks using VLAN in the same data center or between data centers.
  • Layer 2 VTEP gateway to provide high-performance connection to virtualized servers across Layer 3 networks and enable Layer 2 features such as VM live migration (VMotion). On virtualized server hosts, where the NIC does not have VTEP capability and software VTEP can’t meet the network I/O performance requirement, the VTEP can be implemented on Mellanox Spectrum ToR. In some cases, the application running in the VM may desire to use advanced networking features such as Remote Direct Memory Access (RDMA) for inter-VM communication or access to storage. RDMA needs to run in SR-IOV mode on virtualized servers and in cases when Mellanox NIC is not present, the VTEP is best implemented in the ToR.
  • Layer 3 VTEP gateway that provides VXLAN routing capability for traffic between different VXLAN virtual networks, or for north-south traffic between an VXLAN network and a VPN network or the Internet. This feature is supported in Spectrum hardware, and the software to enable it is still under development.

Spectrum is an Open Ethernet switch and can support multiple switch operating system running over it. The Layer 2 VTEP gateway features will first be available in Cumulus Linux over Spectrum, and subsequently in MLNX-OS.

In the third blog, I will show how one can put these innovations together to deploy the most efficient SDN in production.

Comments are closed.