During the last couple of years, the networking industry has invested a lot of effort into developing Software Defined Network (SDN) technology, which is drastically changing data center architecture and enabling large-scale clouds without significantly escalating the TCO (Total Cost of Ownership).
The secret of SDN is not that it enables control of data center traffic via software–it’s not like IT managers were using screwdrivers before to manage the network–but rather that it affords the ability to decouple the control path from the data path. This represents a major shift from the traditional data center networking architecture and therefore offers agility and better economics in modern deployments.
For readers who not familiar with SDN, a simple example can demonstrate the efficiency that SDN provides: Imagine a traffic light that makes its own decisions as to when to change and sends data to the other lamps. Now imagine if that were changed to a centralized control system that takes a global view of the entire traffic pattern throughout the city and therefore makes smarter decisions on how to route the traffic.
The centralized control unit tells each of the lights what to do (using a standard protocol), reducing the complexity of the local units while increasing overall agility. For example, in an emergency, the system can reroute traffic and allow rescue vehicles faster access to the source of the issue.
SDN will help extend the life of the data center infrastructure. For example, if an IT manager needs to add support for new protocols, only a replacement of the central control is required–far less expressive than replacing the switch or router systems with the control embedded inside). SDN’s added value has been so beneficial that it has been expanded from the internal fabric of the data center to the Wide Area Network (WAN), where NFV (Network Function Virtualization) standards are being finalized in the hopes of eliminating the need for function-specific systems.
The concept behind SDN, to decouple the data path from the control path in the switch, isn’t a new one, though. It has actually been part of the InfiniBand standard from Day One. In InfiniBand, control is managed by a central utility called the Subnet Manger (SM), which discovers, configures, activates, and manages the subnet (with up to 48K nodes in a single subnet), thereby helping to keep the InfiniBand switches fast and simple.
What makes InfiniBand even more efficient is the use of RDMA (Remote Direct Memory Access) to move data from servers to servers and from servers to storage. When running over RDMA, the data transport layer is completely offloaded from the CPU to the input/output (IO) controller, where it “runs on silicon”.
This enables very low latency, higher system efficiency, and extreme scalability. Recently, the RDMA concept has been adopted by the Ethernet community, with the publishing of the RoCE (RDMA over Converged Ethernet) standard, and massive deployments are already happening such as Microsoft Azure.
The benefits of running an RDMA offload engine are pretty clear. Functions that run on silicon are much faster than those that use the CPU (and the response time is also much more deterministic). However, the real benefit is in the much higher ROI (Return On Investment). With an average cost of $2,000 per CPU, about 50% of the server cost is due to the CPUs.
The IO controllers, on the other hand, cost in the range of $100. So why use expensive CPU cycles when the same task can run over a much lower cost device, which can also execute the task faster? This also helps increase overall server efficiency such that other expensive components, like SSD, do not need to wait the extra cycles for the CPU. Overall, using advanced IO controllers with offload engines results in a much more balanced system that increases the overall infrastructure efficiency.
Although the use of SDN enables higher agility, it puts a heavy load on the CPU, where expensive cycles are being used run the functions like VXLAN or NVGRE. This doesn’t just increase the CPU overhead; it also affects the overall systems performance, since the traditional function-specific systems have been optimized to run those functions faster.
Therefore, in order to improve efficiency, advanced IO controllers with offload engines should be used. For example, the ConnectX-3 Pro includes VXLAN and NVGRE offload engines that enable the IO controller to run over the maximum line rate with very low CPU overhead.
Another good example is the ConnectX-3 Pro eSwitch (embedded switch), which is capable of performing Layer-2 (L2) switching for the various VMs running on the server, replacing the need to run Open vSwitch (OVS) in software and thereby dramatically reducing software overheads and costs.
The overlay network functionality and the embedded OVS switch offload engines are just couple of examples, and it’s expected that more networking functions will be offloaded from the CPU to the IO controllers, which will results in more balanced systems that will enable higher performance and a lower cost.