In 1967, Gene Amdahl developed a formula that calculates the overall efficiency of a computer system by analyzing how much of the processing can be parallelized and the amount of parallelization that can be applied in the specific system.
At that time, deeper performance analysis had to take into consideration the efficiency of three main hardware resources that are needed for the computation job: the compute, memory and storage.
On the compute side, efficiency has to be measured by how many threads can run in parallel (which depends on the number of cores). The memory size affects the percentage of IO operation that needs to access the storage, which slows significantly the execution time and the overall system efficiency.
Those three hardware resources worked very well until the beginning of 2000. At that time, the computer industry started to use a grid-computing or as it known today, scale-out systems. The benefits of the scale-out architecture are clear. It enables building systems with higher performance, easy to scale with built-in high availability at a lower cost. However, the efficiency of those systems heavily depend on the performance and the resiliency of the interconnect solution.
The importance of the Interconnect became even bigger in the virtualized data center, where the amount of east west traffic continues to grow (as more parallel work is being done). So, if we want to use Amdahl’s law to analyze the efficiency of the scale-out system, in addition to the three traditional items (compute, memory & storage) the fourth item, which is the Interconnect, has to be considered as well.
Originally, interconnect technologies put more emphasis on delivery of the message vs. efficiency and performance. This is why the original Ethernet standard has been developed to rely on running heavy protocol stacks on the server CPU along with using the best-effort delivery service when sending packets. This means sending the packet without checking if it can actually arrive, and if not, then having a mechanism to re-transmit it.
Both affect overall system efficiency. Using the CPU to run the protocol stack means that less “compute cycles” are available for the job itself. In addition, using the Best-Effort delivery mechanism has a significant drawback as it increases the amount of traffic and thus reduces the interconnect efficiency.
As such, the need for higher efficiency scale-out systems and the need to provide (near) real-time response to mobile users, drove new interconnect technologies that are able to handle these requirements as well handle these requirements as well as handle the size of the (big) data. Many improvements have been done in all seven layers of OSI.
One of the new mechanisms that have been developed is to offload tasks that have been done by the CPU to the IO controllers. Among them is the Remote Direct Memory Access (RDMA), where the transport task is totally offloaded to the IO Controller, with “Zero Copy” and minimum involvement of the CPU or the server’s OS.
Today, the RDMA mechanism is part of the InfiniBand standard as well the RDMA over Converged Ethernet (RoCE) standard which are being used in many modern scale-out systems/data centers. Leading scale-out database appliances: Oracle’s Exadata, IBM pureScale, Teradata and Microsoft’s (SQL based) PDW appliances use RDMA based interconnect, and use Mellanox’s end-to-end RDMA solutions.
The same technology has been used in ultra-large cloud infrastructure like Azure where 40GE with RoCE has been used to make the access to storage faster and cheaper. See the reference at 21:12 in the video below from Albert Greenberg, during the keynote presentation at the recent Open Networking Summit (ONS2014):
It is interesting to mention that in some cases, RDMA based interconnect outperforms the memory performance. In the presentation given by Jose Barreto, Principal Program Manager, Microsoft at TechEd’13, he presented a case study that when connecting a client to an SMB Direct based File Storage using 4 x Mellanox ConnectX-3 FDR (56Gb/s) InfiniBand IO adapter, he couldn’t pass more than 18.3GB/sec, while each card can support more than 5.5GB/sec (total of more than 22GB/s). After deep analysis, he found out that the performance is limited by the DDR-2133 (PC3-17000) that has been used, supporting maximum of 18.3GB/sec.
Overlay network technologies like NVGRE or VXLAN are other functions that recently have been offloaded from the CPU to the IO controller. Implementing those standards in software is possible but very demanding from the CPU cycle point of view and using hardware offload like the one in ConnectX-3 Pro enable 5X higher bandwidth with 4X less CPU overhead.
But the industry doesn’t stop in the networking level. Reaching the next level of scalability and performance requires a new generation of data and application accelerations. This is what has been added to the Mellanox’s most recent IO controller called Connect-IB. The controller supports hardware checking of T10 Data Integrity Field / Protection Information (DIF/PI) and other signature types, reducing CPU overhead and accelerating the data to the application.
Signature translation and handover are also done by the adapter, further reducing the load on the CPU. Consolidating compute and storage over FDR InfiniBand with Connect-IB achieves superior performance while reducing data center costs and complexities.
As the amount of data continues to grow, it will continue to driver ultra-scale-out data centers, interconnect will become (and actually already is) a critical comment that affects the data center efficiency and more functions are expected to be offloaded to the IO controllers.
Author: Motti Beck is Director of Marketing, EDC market segment at Mellanox Technologies. Before joining Mellanox, Motti was a founder of several startup companies including BindKey Technologies that was acquired by DuPont Photomask (today Toppan Printing Company LTD) and Butterfly Communications that was acquired by Texas Instruments. He was previously a Business Unit Director at National Semiconductors. Motti hold B.Sc in computer engineering from the Technion – Israel Institute of Technology.
Follow Motti on Twitter: @MottiBeck