Ethernet Storage Fabric – Part 2

Ethernet Storage Fabric

Build the best ESF with Optimized ESF Switches and the Leaf-Spine Architecture

An Ethernet Storage Fabric, or ESF in short, is the fastest and most efficient way to network storage. It leverages the speed, flexibility, and cost efficiencies of Ethernet with the best switching hardware and software packaged in ideal form factors to provide performance, scalability, intelligence, high availability, and simplified management for scale-out storage and hyperconverged infrastructure. Part 1 of this ESF blog explains what an ESF is, its benefits, and why theESF has gained wide adoption, replacing Fibre Channel in modern datacenters. In Part 2 of this blog, we continue the discussion on how to build an ESF in the right way.

Ethernet Storage Fabric is Best Built on the Leaf-Spine Architecture

Many traditional datacenter networks are built on a Three-Tier Architecture. In this framework, when a service in one physical domain needs to reach another domain, the traffic often flows north-south. For example, the request from the web server goes upstream to the aggregation and core layers and then travels down to the SAN storage in another physical domain. The response data traverses through three layers in the same fashion but in reverse. The individual switches are often highlatency and built with inherent blocking inside the switch and/or oversubscription on the uplinks, meaning the connections between the switch layers will become bottlenecks if the traffic load increases beyond the original network design.

In scale-out storage or hyperconverged infrastructure (HCI), compute and storage are “glued” into unified resource pools. HCI takes this one step further whereby all applications are virtualized to run on virtual machines (VMs), or containers, and distributed (and migrated) across the compute/storage pools using policy-based automation. Access to the storage pool, data protection mechanisms (replication, backup, snapshots, and/or recovery), and VM or container migration for load balancing and failover now generate a deluge of network traffic between the nodes in the cluster(s), which is called east-west traffic. This large amount of east-west traffic, running through a three-tier network –which is not designed for it—leads to higher rate of oversubscription from the access layer to the aggregation and core layers. This in turn will inevitably cause congestion and high long-tail latencies. The resulting degraded, unpredictable performance is ill-suited for storage I/O’s, especially when using flash storage or when supporting latency sensitive database, analytics, machine learning, or e-commerce workloads.

To overcome these architectural shortcomings, modern datacenters are adopting the Leaf-Spine Architecture for scale-out storage/HCI (and big data analytics, machine learning, private cloud, etc.). The leaf-spine architecture has a simple topology wherein every leaf switch is directly connected to every spine switch and any pair of leaf switches communicates with a single hop, ensuring consistent and predictable latency. By using Open Shortest Path First (OSPF) or Border Gateway Protocol (BGP) with Equal Cost Multi-Pathing (ECMP), your network utilizes all available links, and achieves maximal link capacity utilization. When network traffic increases, adding more links between each leaf and its spine can easily provide additional bandwidth between leaf switches to avoid oversubscription, which helps avoid congestion and latency. Furthermore, as more and more scale-out storage/HCI deployments take the hybrid cloud approach, using Layer-3 protocols with standard-based VXLAN/EVPN will seamlessly scale Layer-2 storage domains across datacenter/cloud boundaries with performance, mobility and security, to ensure business continuity.

Ethernet Storage Fabric is Built with Dedicated ESF Switches

ESF has to be a transparent network fabric for scale-out storage and HCI, which means that access to remote data offers almost the same performance as access to local data, from the application’s perspective. This translates into close-to-local predictable latency, line-rate throughput with QoS, and linear scalability to accommodate dynamic, agile data movement between nodes – all in a simple and cost-effective way.

With the leaf-spine architecture, the congestion, increased latency, and unpredictable performance caused by traffic jams in the traditional three-tier network is now gone. Within the datacenter, any storage/HCI I/O transverses the ESF in a single hop if the end points are in the same rack or in three hops if across racks. However, dedicated ESF switches are required to construct the fabric so that storage/HCI traffic, including bursty I/O’s and data flows from faster devices such as NVMe SSDs, can always reach the destination with predictable response time. Using a switch not designed for the demands of an ESF can result in higher and unpredictable network latencies even with the more efficient leaf-spine network designs.As mentioned above, that’s exactly what you are trying to avoid for scale-out storage or HCI.

In addition more and more storage and HCI (and big data, and machine learning) platforms employ RDMA over Converged Ethernet (RoCE) to deliver faster network performance and more efficient CPU utilization. As a result optimized congestion management and QoS are required in the ESF switches, to deliver a non-disruptive and transparent network fabric for business-critical applications.

As the leaf-spine architecture makes ESF extremely easy to scale, the ESF switches need be simple to configure for fast and easy deployment and scale-out. Automated network provisioning, monitoring and management are required for virtualized workloads and storage traffic. So is seamless integration with clouds – secure, isolated and agile workspaces for multiple tenants.

Not every datacenter switch can meet these requirements. Dedicated ESF switches such as Mellanox Spectrum™ switches are required.

Mellanox ESF Switches

Mellanox Spectrum switches are storage optimized. They provide line-rate, zero packet loss, resilient network performance and enable a high-density, scalable rack design, and they are non-blocking both internally and with the number and bandwidth of their uplinks.

  • Ideal port count and optimal form factor. Most scale-out storage or HCI racks contain no more than 16 nodes so building a new storage or hyperconverged cluster using two standard 32-port or 48-port switches will have wasted ports. The Mellanox SN2010/2100 switches provide enough ports in a half-width, 1U form factor. Two of these switches can be installed in a 1U rack space for high-availability. This makes it possible to house the entire datacenter with 4 HCI nodes in a 3U appliance, or 16 nodes and two switches in less than half a rack. At the opposite end of the spectrum, using break-out cables, the SN2100 supports up to 64 10/25GbE ports in the half-width, 1U form factor, and enables the highest-density rack design.
  • Consistent, high performance. An ESF must keep up with faster storage and business critical applications. NVMe SSDs with Intel Optane (3D XPoint) technology have achieved latency of 10 micro-seconds or less. At this level, a few hundreds of nanoseconds network latency will significantly impact storage and application performance, especially when traversing multiple switches. Other Ethernet switches often produce latencies in the tens of microseconds in actual deployments, whereas Spectrum switches have ~300ns port-to-port latency and zero packet loss, regardless of frame sizes and speeds. Furthermore, Spectrum switches are designed with a shared buffer, resulting in maximum micro-burst absorption capacity. As shown in the diagram below, Spectrum switches are the only ESF switches supporting the fastest types of storage. Refer to Tolly Report for more details.

  • RoCE optimization. RoCE (RDMA over Converged Ethernet) is the only way to deliver close-to-local latency for fast storage and in-memory applications such as NVMe-oF, Microsoft Storage Spaces Direct with SMB 3.0, Spark, IBM Spectrum Scale (GPFS), etc. The optimized buffer design in Spectrum, combined with storage-aware QoS and faster Explicit Congestion Notification (ECN as supported in RoCE v2), delivers optimal congestion management for RoCE traffic. For example, Tencent achieved record-setting performance for big-data analytics with Spectrum. Deploying RoCE on Spectrum switches is simple, with three pre-defined profiles for buffer configuration. Refer to this Mellanox community post for more detailed information.
  • Automated network provisioning, monitoring and troubleshooting. Often times, performance, configuration and support issues in scale-out storage and HCI are network related. Zero-Touch Provisioning is provided through Ansible integration and Mellanox’s network orchestration and management software, NEO™. Ansible Playbooks and NEO not only improve operation efficiency but also eliminate network downtime caused by human errors. NEO provides network visibility, performance and health monitoring, plus alerts/notifications to storage/HCI administrators and guides them in troubleshooting. REST API-based, NEO can be easily integrated with scale-out storage or HCI software. For example, NEO is integrated with Nutanix AHV to provide automated VM-level network provisioning.
  • Cloud scale and future proof. Mellanox Spectrum TOR (leaf) and aggregation (spine) switches allow you to scale from half rack, full rack, multiple racks, to multiple datacenters. With rich L2/L3 features, including VXLAN/EVPN support for DCI (Data Center Interconnect), and NEO-driven network automation, Spectrum switches bring cloud-scale performance and manageability to scale-out storage and HCI. And by supporting all speeds including 10/25/40/50/100GbE, the same Spectrum ESF switches that you use today will continue servicing your needs when you migrate to next generation scale-out storage or HCI platforms that require higher speeds.

In a nutshell, Mellanox Spectrum switches make a perfect foundation for an ESF. Simple to deploy, easy to scale, and free of network bottlenecks, they allow scale-out storage and HCI to truly disaggregate data processing from data location, to achieve performance and scale. And Mellanox Spectrum switches deliver all these benefits in such an efficient way that you can spend less on networking, and more on your data and applications!

You can find more technical details about Mellanox Spectrum Switches in the Mellanox Community and on

Follow us on Twitter: @MellanoxTech.

Related Resources

About Jeff Shao

Jeff Shao is Director, Ethernet Alliances at Mellanox Technologies. Prior to Mellanox, he held senior product management and marketing roles at LSI (Avago), as well as Micrel, Vitesse Semiconductor & Promise Technology. He holds a MBA from University of California, Berkeley and a Bachelor of Science in Physics from University of Science & Technology of China.

Comments are closed.