All posts by Matthew Hitt

About Matthew Hitt

Matt Hitt is a Senior OEM Marketing Manager at Mellanox Technologies. His responsibilities include working with HPE and other OEMs to bring joint Mellanox and OEM solutions to market. Matt Joined Mellanox in 2016, after spending more than seven years at Hewlett Packard Enterprise, in various sales, marketing, and channel roles. His experience provides a unique perspective on what drives the IT industry and customers’ needs in the market, enabling him to better position key technology that solves real world problems.

HPE and Mellanox Offer Cloud-Ready OpenNFV Solutions with Record DPDK Performance

HPE and Mellanox recently published a Solution Brief highlighting their cloud-ready OpenNFV (Network Functions Virtualization) solution which demonstrates record DPDK performance and OVS acceleration using Mellanox ASAP2 (Accelerated Switching and Packet Processing). The results were based on HPE ProLiant Gen10 Servers and Mellanox ConnectX-5 Adapters and Spectrum Switches. This joint-solution results in the highest performance for NFV and enables agile virtualized network functions (VNF) delivery for Telco customers on industry standard x86 hardware without the need for proprietary appliances.

The shift to virtualization and cloud technologies with Telco/CSP (Communication Service Provider) Customers has resulted in the need to deploy and scale services quickly and efficiently. Mellanox NFV capabilities address these pain points and saves both time and money with virtual network functions (VNFs) and maximize packet processing efficiency. In addition, these functionalities are available out of the box and are validated and supported by both HPE and Mellanox ensuring a seamless deployment and support experience.

 

Off-the-Shelf, Out-of-the-Box, Turnkey NFV Solution

Mellanox and HPE offer most of these incredible capabilities on pre-built performance optimized configurations on HPE DL380, DL385, and DL360 ProLiant Gen10 servers. The HPE 640SFP28 adapter (based on Mellanox ConnectX-4Lx) is pre-configured in several off-the-self Built to Order (BTO) configurations meaning there is no need to design a unique configuration and wait for it to be built to deploy NFV. Or for maximum performance, chose the 841QSFP28 2-port 100Gb VPI adapter when configuring your server. In addition, the HPE StoreFabric M-series switch family (powered by Mellanox Spectrum Switches) offers exceptional top-of-rack 10/25/50/100G Ethernet performance at unbelievable price points. Combining all of this with Mellanox LinkX cables (also available through HPE) and you have a true end-to-end NFV solution ready to deploy instantly and delivering incredible value. And, since this entire solution is sold by HPE, you get to leverage the end-to-end solution to get to the best cost available.

The Bar Has Been Raised: HPE StoreFabric M-series Ethernet Switch Series

One of the most unique aspects of this, and other Mellanox offerings, is that it is runs entirely end-to-end on Mellanox fabric, all supported and sold by HPE. This not only simplifies procurement, deployment, management, and functionality, but means you have a single point of support for the entire solution. In addition to this, we take it even a step further with the HPE StoreFabric M-series switch family. The HPE SN2010M and SN2100M switches are a half-width form factor meaning you get fully redundant top-of-rack 10/25/50/100GbE switches in just 1U space. No other vendor can offer this innovative feature. The entire switch family offers industry leading performance at an unreal price. For more information visit the HPE StoreFabric M-series page.

Mellanox ConnectX-5, The Next Generation in High-performance Interconnects

When performance matters, Mellanox delivers with 100Gb ConnectX-5 adapters. Based on testing done in the HPE Telco NFV Infrastructure lab, the HPE 841QSFP28 adapter achieved record DPDK performance running on HPE server hardware. The Mellanox DPDK offers the highest throughput and packet rate, lowest capex, lowest latency, and best in class security. Taking it a step further, Mellanox adapters dramatically accelerate OVS workload performance using ASAP2 (Accelerated Switching and Packet Processing). Mellanox ASAP2 delivers the best performance with zero CPU overhead, the highest VXLAN throughput and packet rate, up to 10x better than OVS over DPDK, the lowest latency, and is open source with a broad ecosystem of partners. See the Solution Brief published by HPE here.

 

HPE Apollo 70 System

Mellanox Socket Direct™ Technology Accelerates HPE Apollo 70 System

The HPE Apollo 70 System is a 2U Arm-based platform designed to use Socket Direct™ technology for its network interface. It is purpose-built for High Performance Computing clusters where density, scalability and performance matter.  Being able to be deployed as a single 2U system and scale up to meet a variety of HPC workloads means customers have the flexibility to grow their cluster as needed. Overall, the Apollo 70 offers 33% more memory bandwidth than today’s industry standard HPC servers.

The architecture of the Apollo 70 is unique in that it offers a dual socket design, paired with a single PCIe x16 and a x8 slot, each connecting to a different socket. The Mellanox ConnectX-5 100Gb Socket Direct™ OCP adapter is fully optimized to maximize this design, unleashing unmatched performance capabilities of standard server interconnects offering dual x8 connectivity. Using one Mellanox Socket Direct capable network adapter improves latency of the CPU by removing the need to pass all data through CPU 1 to reach the network. In addition to boosting CPU 2 performance, this design maximizes PCIe lanes available to memory and GPUs. Another added benefit is the Mellanox Socket Direct™ enabled adapter presents itself to the server management interface as a single interconnect which simplifies network management.

Socket Direct™ at a glance

Mellanox ConnectX-5 with Socket Direct™ provides 100Gb port speed, even to servers without x16 PCIe slots, by splitting the 16-lane PCIe bus into two x8 connectors. For the Apollo 70, Mellanox and HPE worked together to develop a single OCP card that bridges the two sockets together at the PCIe bus. In other cases, Socket Direct™ can be achieved with a PCIe x8 edge connector and a parallel x8 auxiliary PCIe connection card. Mellanox uses Multi-Host™ technology to allow multiple hosts to be connected into a single adapter by separating the PCIe interface into multiple and independent interfaces. Socket Direct also enables GPUDirect® RDMA for all CPU/GPU pairs by ensuring that all GPUs are linked to CPUs closest to the adapter card and boosts performance on both sockets by creating a direct connection between the sockets and the adapter card.

Real Life Scenario Lab Testing

In a real-life scenario, applications that run on a dual-socket server generate data traversing the CPUs (over the inter-processor communication bus). To receive a more realistic measurement of the network performance, we applied a test load on the inter-processor bus, and then measured the effect of this load on the external data traffic of the server. We took these measurements, while comparing the two types of adapters (standard adapter and Socket Direct™). Figure 4 compares the average latency of the two different adapters. The graph shows that when using the Socket Direct™ adapter, latency is reduced by 80% compared to the standard adapter. This latency improvement is a result of the direct path both CPU sockets take to reach the network and the even distribution of TCP streams between the CPUs.

Figure 4

Figure 5 shows CPU utilization. It is evident that direct access to the network using Socket Direct™ also provides a 50% improvement in CPU utilization. Moreover, the even distribution of TCP streams reduces the average cache miss count on both CPUs versus a standard configuration server, which further improves CPU utilization.

Figure 5

When comparing the servers’ external throughput while applying the inter-processor load (Figure 6), it is evident that by implementing Socket Direct™, the throughput is improved by 16%-28% compared to the standard adapter connection.

 

Figure 6


Why HPE chose Mellanox

Mellanox ConnectX-5 delivers high bandwidth, low latency, and high computation efficiency for high performance, data intensive and scalable compute and storage platforms. As a leader in the HPC interconnect market, Mellanox offers several advantages to accelerate the Apollo 70 System, including Socket Direct™ and Multi-Host™, as well as the multiple offloads provided by ConnectX-5. Our strong partnership with HPE and open standards like OCP made developing a platform and adapter from the ground up to meet specific HPC workloads demands possible.

Fully Virtualized for Enterprise Clouds

Whether using the ConnectX-5 for HPC or Ethernet network connectivity, almost all aspects of data center connections are virtualized today, including networks, network devices, and host interfaces. All network connections are defined by software, enabling any server to readily connect to any network, storage, or service. To accomplish this, Connect-X adapters offer a comprehensive set of network and I/O virtualization features:

  • Overlay networks: The adapter incorporates overlay capabilities enabling isolated Ethernet networks to reside within the fabric. These networks operate at the full fabric bandwidth of the adapter (up to 100 Gb/s). Furthermore, the adapter supports InfiniBand, Ethernet, and IP isolation mechanisms such as partitions, VLANs, and subnets.
  • Virtualized server I/O: Virtual machines are presented with virtual InfiniBand, or Ethernet adapters. These virtual network adapters join the fabric through a virtual switch, one virtual switch is supported per physical port.
  • Port virtualization: Both support for SR-IOV and paravirtualization help to provide a seamless hypervisor with Open vSwitch support.
  • RDMA enabled: InfiniBand and Ethernet support RDMA, improving server and storage efficiency and enabling applications to run faster and to be deployed with fewer systems.

 

Adapter Key Features

InfiniBand

  • EDR/FDR/QDR/DDR/SDR
  • Offloads
    • Tag matching and Rendezvous
    • NVMe over Fabric (NVMe-oF)
    • Burst buffer offloads for background checkpointing
    • Adaptive routing on reliable transport
    • vSwitch/vRouter offloads/Open vSwitch (OVS)
    • Erasure Coding
    • T10 DIF
  • PXE boot over Ethernet or Infiniband
  • Virtual Protocol Interconnect (VPI)
  • Mellanox PeerDirectTM RDMA
  • Dynamically Connected Transport (DCT)
  • On demand paging (ODP)
  • Extended Reliable Connected transport (XRC)
  • End-to-end QoS and congestion control

 

Ethernet

  • Virtual Protocol Interconnect (VPI)
  • RoCEv1
  • RoCEv2
  • Mellanox PeerDirectTM RDMA
  • On demand paging (ODP)
  • NVMe over Fabric (NVMf) Target Offloads
  • Enhanced vSwitch / vRouter Offloads
  • Hardware offloads for NVGRE and VXLAN encapsulated traffic
  • End-to-end QoS and congestion control

 

Conclusion

Mellanox Socket Direct adapters provide the highest performance and most flexible solution for the most demanding applications and markets. Socket Direct extends server performance and utilization with maximum throughput connectivity. The Socket Direct adapter within a dual-socket server, enables both CPUs to connect directly to the network, delivering lower latency, lower CPU utilization and higher network throughput. With added virtualization features, support for multiple protocols and form factors, the Mellanox ConnectX family provides high-performance and the most efficient network infrastructure. With Mellanox, there is no need to compromiser performance, security, or usability in high-performance virtualized environments.

For more information please see the HPE Quickspecs:

For more information on Mellanox ConnectX-5 Socket Direct please see the Product Brief.

For more information on the HPE Apollo 70, please Click Here.

Powering Artificial Intelligence with the World’s Best Ethernet Solutions

Before you break out the torches and pitch forks let’s play this out. Many of today’s advanced HPC workloads such as AI, Machine Learning, and Data Analytics are possible over Ethernet. With Mellanox Spectrum™ 25/50/100Gb Ethernet Switches and Connect®-X network adapters there is a very compelling argument for why it may be the best and most efficient networking solution for them. And, it’s becoming today’s new reality.

Let’s face it, 10GbE networks can’t support the bandwidth needed by Artificial Intelligence and other demanding data driven workloads. These newer technologies demand amazing computational capabilities and place huge demands on the network infrastructure. Twenty years ago, when 10GbE was first announced, it was considered overkill for most applications and workloads…and you had plenty of capacity on your 386 processor and your 1.44 MB 3.5-inch floppy disk. Today’s screaming fast CPU speeds, multi-core processors and the advent of GPUs, demand increased network speeds to move data in and out. State-of-the-art NVMe storage is unleashing new performance standards for storage, and already NVMe over Fabrics is taking hold in the industry for faster transfer of data between the host computer and wickedly fast solid-state storage. With 200GbE speeds just on the horizon, 10GbE protocols are fading into the pile of yesterday’s technology. I already have a spot reserved on my display shelf for a 10GbE network adapter right next to my Palm Pilot and Nintendo 64 (both still work by the way!).

Let’s walk through a check list of what already makes the Mellanox Ethernet switches and adapters the best network solution on the market today:

  • Highest Bandwidth – 25/50/100Gb and soon 200GbE
  • ZERO Packet Loss – Predictable Performance
  • Highest Efficiency – RoCE offload capabilities
  • Lowest Latency – 1.4X better than the competition, true cut-through latency
  • Lowest Power Consumption – 1.3X better than the competition
  • Dynamically Shared, Flexible Buffering – Excellent flexibility to dynamically adapt
  • Advanced Load Balancing – Improves scale and availability
  • Predictable Performance – 3.2Tb/s Switching capacity with wire speed performance
  • Open – Support for Multiple Open Operating Systems including Mellanox OS (Onyx) and Cumulus
  • Low Cost – Superior Value

For both deep learning and inferencing, accuracy for real-time decisions from the most robust cognitive computing applications today depends on fast data delivery. Mellanox end-to-end Ethernet solutions meet and exceed the most demanding criteria and leave the competition in the dust. Just ask one of our customers currently using Mellanox Ethernet for speech recognition technology, iFLYTEK.

We can easily see the performance advantages with TensorFlow over a Mellanox 100GbE network versus a 10GbE network, both taking advantage of RDMA in the chart below. While distributed TensorFlow takes full advantage of RDMA to eliminate processing bottlenecks, even with large-scale images the Mellanox 100GbE network delivers the expected performance and exceptional scalability from the 32 NVIDIA Tesla P100 GPUs. For both 25GbE and 100GbE, it’s evident that those who are still using 10GbE are falling short of any return on investment they might have thought they were achieving.

Beyond the performance advantages, the economic benefits of running AI workloads over Mellanox 25/50/100GbE are substantial. Spectrum switches and ConnectX network adapters deliver unbeatable performance at an even more unbeatable price point, yielding an outstanding ROI. With flexible port counts and cable options allowing up to 64 fully redundant 10/25/50 GbE ports in a 1U rack space, Mellanox end-to-end Ethernet solutions are a game changer for state-of-the-art data centers that wish to maximize the value of their data.

Supporting Resources: