Introduction to InfiniBand

 
InfiniBand,

InfiniBand is a network communications protocol that offers a switch-based fabric of point-to-point bi-directional serial links between processor nodes, as well as between processor nodes and input/output nodes, such as disks or storage. Every link has exactly one device connected to each end of the link, such that the characteristics controlling the transmission (sending and receiving) at each end are well defined and controlled.

 

InfiniBand creates a private, protected channel directly between the nodes via switches, and facilitates data and message movement without CPU involvement with Remote Direct Memory Access (RDMA) and Send/Receive offloads that are managed and performed by InfiniBand adapters. The adapters are connected on one end to the CPU over a PCI Express interface and to the InfiniBand subnet through InfiniBand network ports on the other. This provides distinct advantages over other network communications protocols, including higher bandwidth, lower latency, and enhanced scalability.

Figure 1: Basic InfiniBand Structure

Figure 1: Basic InfiniBand Structure

The InfiniBand Trade Association (IBTA), established in 1999, chartered, maintains, and furthers the InfiniBand specification, and is responsible for compliance and interoperability testing of commercial InfiniBand products. Through its roadmap, the IBTA has pushed the development of higher performance more aggressively than any other interconnect solution, ensuring an architecture that is designed for the 21st century.

 

InfiniBand is designed to enable the most efficient data center implementation. It natively supports server virtualization, overlay networks, and Software-Defined Networking (SDN).  InfiniBand takes an application-centric approach to messaging, finding the path of least resistance to deliver data from one point to another. This differs from traditional network protocols (such as TCP/IP and Fibre Channel), which use a more network-centric method for communicating.

 

Direct access means that an application does not rely on the operating system to deliver a message. In traditional interconnect, the operating system is the sole owner of shared network resources, meaning that applications cannot have direct access to the network. Instead, applications must rely on the operating system to transfer data from the application’s virtual buffer to the network stack and onto the wire, and the operating system at the receiving end must have similar involvement, only in reverse.

 

In contrast, InfiniBand avoids operating system involvement by bypassing the network stack to create the direct channel for communication between applications at either end. The simple goal of InfiniBand is to provide a message service for an application to communicate directly with another application or storage. Once that is established, the rest of the InfiniBand architecture works to ensure that these channels are capable of carrying messages of varying sizes, to virtual address spaces spanning great physical distances, with isolation and security.

 

InfiniBand is architected for hardware implementation, unlike TCP which is architected with software implementation in mind. InfiniBand is therefore a lighter weight transport service than TCP in that it does not need to re-order packets, as the lower level link layer provides in-order packet delivery. The transport layer is only required to check the packet sequence and deliver packets in order.

 

Further, because InfiniBand offers credit-based flow control (where a sender node does not send data beyond the “credit” amount that has been advertised by the receive buffer on the opposite side of a link), the transport layer does not require a drop packet mechanism like the TCP windowing algorithm to determine the optimal number of in-flight packets. This enables efficient products delivering 56 and soon 100Gb/s data rates to applications with very low latency and negligible CPU utilization.

 

InfiniBand uses Remote Direct Memory Access (RDMA) as its method of transferring the data from one end of the channel to the other. RDMA is the ability to transfer data directly between the applications over the network with no operating system involvement and while consuming negligible CPU resources on both sides (zero-copy transfers). The application on the other side simply reads the message directly from the memory, and the message has been transmitted successfully.

 

This reduced CPU overhead increases the network’s ability to move data quickly and allows applications to receive data faster. The time interval for a given quantity of data to be transmitted from source to destination is known as latency, and the lower the latency, the faster the application job completion.

Figure 2:  Traditional Interconnect

Figure 2: Traditional Interconnect

Figure 3:  RDMA Zero-Copy Interconnect

Figure 3: RDMA Zero-Copy Interconnect

FDR InfiniBand has achieved latency as low as 0.7 microseconds, far and away the lowest latency available for data transmission.

 

InfiniBand’s primary advantages over other interconnect technologies include:

  • Higher throughput – InfiniBand constantly supports the highest end-to-end throughput, towards the server and the storage connection
    • In 2008, InfiniBand introduced 40Gb/s (QDR) to the market, while Ethernet supported 10Gb and Fibre Channel only 8Gb
    • In 2011, InfiniBand introduced 56Gb/s (FDR) to the market, while Ethernet supported 40Gb and Fibre Channel only 16Gb
    • 100Gb/s (EDR) InfiniBand products were launched in 2014, and 200Gb/s (HDR) will follow in the next few years, sustaining the market gap with the competitive fabrics
  • Lower latency – RDMA zero-copy networking reduces OS overhead so data can move through the network quickly
  • Enhanced scalability – InfiniBand can accommodate flat networks of around 40,000 nodes in a single subnet and up to 2^128 nodes (virtually an unlimited number) in a global network, based on the same switch components simply by adding additional switches
  • Higher CPU efficiency – With data movement offloads the CPU can spend more compute cycles on its applications, which will reduce run time and increase the number of jobs per day
  • Reduced management overhead – InfiniBand switches can run in Software-Defined Networking (SDN) mode, allowing them to run as part of the fabric without CPU management
  • Simplicity – InfiniBand is exceedingly easy to install when building a simple fat-tree cluster, as opposed to Ethernet which requires knowledge of various advanced protocols to build an IT cluster

 

Most of all, InfiniBand offers a better return on investment, with higher throughput and CPU efficiency at competitive pricing, equaling higher productivity with a lower cost per endpoint.

 

Mellanox offers a complete FDR 56Gb/s InfiniBand end-to-end portfolio for data centers and high-performance computing systems, which includes switches and cables. Mellanox’s Connect-IB adapter cards deliver leading performance with maximum bandwidth, low latency, and computing efficiency for performance-driven server and storage applications.

 

Mellanox’s SwitchX family of FDR InfiniBand switches and Unified Fabric Management software incorporate advanced tools that simplify networking management and installation, and provide the needed capabilities for the highest scalability and future growth. Mellanox’s line of FDR copper and fiber cables ensure the highest interconnect performance. With Mellanox end to end, IT managers can be assured of the highest performance and most efficient network fabric.

 

For more information on Mellanox InfiniBand products, please see https://www.mellanox.com/page/products_overview.

 

 

Comments are closed.