What does Football have to do with NVMe-oF?

 
NVMe Over Fabrics

What is NVMe-oF and what Mellanox and PureStorage’s FlashArrayX partnership means.

Everybody likes a good analogy, even better if it’s a football analogy. First to clarify, by saying Football I mean Soccer.  I know that in some places that’s an important clarification.  In my neighborhood, we played football and we used our legs 😊.

Now that we settled that I would like to present my view on NVMe-oF.

Recently, Pure Storage announced their NVME solution – FlashArrayX.  By doing so, Pure joined a group of select vendors offering this fast storage solution – the fastest production level storage solution today.

Mellanox Technologies partnered with Pure Storage to provide NVMe-oF.   It’s interesting that Mellanox is almost always the network partner for vendors providing an NVMe-oF solution.  Why is that?

Simply, NVMe-oF stands for NVME over Fabrics.  NVME over Fabrics is the way to extend the NVMe storage over the network.

NVMe over Fabrics is essentially NVME over RDMA, and RDMA is basically Mellanox.  So, to summarize: NVMe-oF means NVME over a Mellanox network.

To clarify, RDMA is not a proprietary Mellanox protocol – RDMA is a standard.   NVMe-oF solutions can be implemented over any network, but the fact is – Mellanox does RDMA best, simply because Mellanox has been doing RDMA for 20 years.

So, what is RDMA in a nutshell?

It is also called RoCE (RDMA over Converged Ethernet

NVMe cuts out legacy storage communication, such as SCSI, on the local storage node to enable fast storage.  Looking at Diagram 1 below, similarly, RDMA cuts out the legacy network stack on the server to enable the fastest way to copy memory between nodes over the network.

RDMA, known as RoCE (RDMA over Converged Ethernet) in Ethernet environments, provides the application a direct access to the Network Interface Card (NIC).   By bypassing the Kernel and TCP/IP stack on the client and storage nodes, RDMA improves both Speed and CPU offload. Speed is achieved since the data transfer is done by hardware offload in the NIC and CPU offload is achieved since the CPU is no longer needed for a simple memory copy over the network.

What does it mean? The equation is simple, less CPU for network is more CPU for Storage and compute and that means more IOPS, more processing – faster applications.

Diagram 1

RoCE: RDMA over Converged Ethernet

What are the RDMA prerequisites?

  1. An application that knows how to “talk” RDMA. This prerequisite is fulfilled by our great partners, such as Pure Storage. For example, Pure Storage provides RDMA support in FlashArray X.
  2. A network Interface Card (NIC) that supports RDMA offload – That is 100% Mellanox ConnectX Network Interface (NIC) Cards family.
  3. A network that can handle RoCE. The switches need to support DCB features like PFC and ECN at a minimum, but the best RDMA network will use switches that are RDMA aware.

Why do we need a switch that is aware of RDMA?

As I mentioned above RDMA cuts the TCP/IP stack.   But wait a minute – TCP has a very simple, yet important, role in the Ethernet fabric called TCP retransmission.  This capability is required to recover when packets are lost for some reason.

Let’s look at this Diagram 1 again and see that red arrow between the NICs.  This red arrow is a network, at least three switches in the path between compute and storage nodes:

Top of Rack <-> Aggregation <-> Top of Rack

 

Mellanox Spectrum

What will happen if packets are dropped in this network? This can easily happen due to congestion or other reasons, and so what will force the retransmission of packets since TCP not being used?

Here comes the football analogy – think of a player as a switch or a NIC and the ball is a packet playing in an NVMe-oF match.

 

 

We need to make sure that in the NVMe-oF match, any pass between two players is completed successfully.  For this, we need the best players in the world on the pitch, just like we would want in a football match. How can we make sure of that?

 

 

This is exactly the reason we need the DCB features.  We need the network to be responsible for the traffic to make sure it will flow without any drops.

We need football players that can make the pass fast and 100% of the time – so that we have a perfect match.

Wow … We need 11 Lionel Messis on our pitch.

It’s true that many network vendors implement the basic required features in their switches.  But only Mellanox, as THE RDMA company, has a switch that is aware of RDMA needs.  Mellanox Spectrum switches have the best-in-class buffer architecture with the lowest latency for NVMe-oF and with the buffer settings for the RDMA profile configured automatically.   Furthermore, the Mellanox Spectrum switches offer an easy end to end provisioning and monitoring for RDMA, simplifying the NVMe-oF provisioning and monitoring in the network.

So basically, you can think of the Mellanox Spectrum switch as the Messi of switches 😊

Here is an example of a lossless network profile configuration, where the buffers are configured according to the profile and PFC is configured across the switch –

And here is an example of a show command output that presents the buffer allocation –

Mellanox’s Spectrum line, powered by Mellanox Spectrum ASIC

Our Spectrum switches have many advantages in addition to extreme performance which is critical for NVMe-oF environments.  Spectrum switches are Open Ethernet switches, as they offer a choice of Network Operating Systems (NOS).  It can be

Onyx, it can be Cumulus Linux – the performance will be the same.  When it comes to performance it’s all about the ASIC in the switch. The Spectrum line of products are powered by the Mellanox Spectrum ASIC – with low latency, zero packet loss and best in class buffers.

 

 

 

 

How great is that?  Same top performance and our Spectrum Switches can be Onyx Messi or it can be Cumulus Ronaldo.  We can choose our favorite – I will leave it to you do decide.  I have another favorite player, one legendary number 10,  but this is for another blog…

 

Further reading –

 

 

 

 

 

About Avi Alkobi

Avi Alkobi is Ethernet Technical Marketing manager for EMEA in Mellanox Technologies. For the past 8 years he has worked at Mellanox in various roles focusing on the Ethernet switch product line, first as a SW developer, then as a team leader of the infrastructure team responsible for the first Ethernet switch management infrastructure. More recently, he has worked as a senior application engineer supporting the field on post sales, pre sales and complex proof of concepts for E2E Ethernet and InfiniBand solutions. Before coming to Mellanox, he worked for five years at Comverse Technology and prior to this, in the Israeli security industry as a software developer. He holds a Bachelor of Science degree in Computer Science and M.B.A from the Bar-Ilan University in Israel.

Comments are closed.