All posts by Jeff Shao

About Jeff Shao

Jeff Shao is Director, Ethernet Alliances at Mellanox Technologies. Prior to Mellanox, he held senior product management and marketing roles at LSI (Avago), as well as Micrel, Vitesse Semiconductor & Promise Technology. He holds a MBA from University of California, Berkeley and a Bachelor of Science in Physics from University of Science & Technology of China.

Ethernet Storage Fabric – Part 2

An Ethernet Storage Fabric, or ESF in short, is the fastest and most efficient way to network storage. It leverages the speed, flexibility, and cost efficiencies of Ethernet with the best switching hardware and software packaged in ideal form factors to provide performance, scalability, intelligence, high availability, and simplified management for scale-out storage and hyperconverged infrastructure. Part 1 of this ESF blog explains what an ESF is, its benefits, and why theESF has gained wide adoption, replacing Fibre Channel in modern datacenters. In Part 2 of this blog, we continue the discussion on how to build an ESF in the right way.

Ethernet Storage Fabric is Best Built on the Leaf-Spine Architecture

Many traditional datacenter networks are built on a Three-Tier Architecture. In this framework, when a service in one physical domain needs to reach another domain, the traffic often flows north-south. For example, the request from the web server goes upstream to the aggregation and core layers and then travels down to the SAN storage in another physical domain. The response data traverses through three layers in the same fashion but in reverse. The individual switches are often highlatency and built with inherent blocking inside the switch and/or oversubscription on the uplinks, meaning the connections between the switch layers will become bottlenecks if the traffic load increases beyond the original network design.

In scale-out storage or hyperconverged infrastructure (HCI), compute and storage are “glued” into unified resource pools. HCI takes this one step further whereby all applications are virtualized to run on virtual machines (VMs), or containers, and distributed (and migrated) across the compute/storage pools using policy-based automation. Access to the storage pool, data protection mechanisms (replication, backup, snapshots, and/or recovery), and VM or container migration for load balancing and failover now generate a deluge of network traffic between the nodes in the cluster(s), which is called east-west traffic. This large amount of east-west traffic, running through a three-tier network –which is not designed for it—leads to higher rate of oversubscription from the access layer to the aggregation and core layers. This in turn will inevitably cause congestion and high long-tail latencies. The resulting degraded, unpredictable performance is ill-suited for storage I/O’s, especially when using flash storage or when supporting latency sensitive database, analytics, machine learning, or e-commerce workloads.

To overcome these architectural shortcomings, modern datacenters are adopting the Leaf-Spine Architecture for scale-out storage/HCI (and big data analytics, machine learning, private cloud, etc.). The leaf-spine architecture has a simple topology wherein every leaf switch is directly connected to every spine switch and any pair of leaf switches communicates with a single hop, ensuring consistent and predictable latency. By using Open Shortest Path First (OSPF) or Border Gateway Protocol (BGP) with Equal Cost Multi-Pathing (ECMP), your network utilizes all available links, and achieves maximal link capacity utilization. When network traffic increases, adding more links between each leaf and its spine can easily provide additional bandwidth between leaf switches to avoid oversubscription, which helps avoid congestion and latency. Furthermore, as more and more scale-out storage/HCI deployments take the hybrid cloud approach, using Layer-3 protocols with standard-based VXLAN/EVPN will seamlessly scale Layer-2 storage domains across datacenter/cloud boundaries with performance, mobility and security, to ensure business continuity.

Ethernet Storage Fabric is Built with Dedicated ESF Switches

ESF has to be a transparent network fabric for scale-out storage and HCI, which means that access to remote data offers almost the same performance as access to local data, from the application’s perspective. This translates into close-to-local predictable latency, line-rate throughput with QoS, and linear scalability to accommodate dynamic, agile data movement between nodes – all in a simple and cost-effective way.

With the leaf-spine architecture, the congestion, increased latency, and unpredictable performance caused by traffic jams in the traditional three-tier network is now gone. Within the datacenter, any storage/HCI I/O transverses the ESF in a single hop if the end points are in the same rack or in three hops if across racks. However, dedicated ESF switches are required to construct the fabric so that storage/HCI traffic, including bursty I/O’s and data flows from faster devices such as NVMe SSDs, can always reach the destination with predictable response time. Using a switch not designed for the demands of an ESF can result in higher and unpredictable network latencies even with the more efficient leaf-spine network designs.As mentioned above, that’s exactly what you are trying to avoid for scale-out storage or HCI.

In addition more and more storage and HCI (and big data, and machine learning) platforms employ RDMA over Converged Ethernet (RoCE) to deliver faster network performance and more efficient CPU utilization. As a result optimized congestion management and QoS are required in the ESF switches, to deliver a non-disruptive and transparent network fabric for business-critical applications.

As the leaf-spine architecture makes ESF extremely easy to scale, the ESF switches need be simple to configure for fast and easy deployment and scale-out. Automated network provisioning, monitoring and management are required for virtualized workloads and storage traffic. So is seamless integration with clouds – secure, isolated and agile workspaces for multiple tenants.

Not every datacenter switch can meet these requirements. Dedicated ESF switches such as Mellanox Spectrum™ switches are required.

Mellanox ESF Switches

Mellanox Spectrum switches are storage optimized. They provide line-rate, zero packet loss, resilient network performance and enable a high-density, scalable rack design, and they are non-blocking both internally and with the number and bandwidth of their uplinks.

  • Ideal port count and optimal form factor. Most scale-out storage or HCI racks contain no more than 16 nodes so building a new storage or hyperconverged cluster using two standard 32-port or 48-port switches will have wasted ports. The Mellanox SN2010/2100 switches provide enough ports in a half-width, 1U form factor. Two of these switches can be installed in a 1U rack space for high-availability. This makes it possible to house the entire datacenter with 4 HCI nodes in a 3U appliance, or 16 nodes and two switches in less than half a rack. At the opposite end of the spectrum, using break-out cables, the SN2100 supports up to 64 10/25GbE ports in the half-width, 1U form factor, and enables the highest-density rack design.
  • Consistent, high performance. An ESF must keep up with faster storage and business critical applications. NVMe SSDs with Intel Optane (3D XPoint) technology have achieved latency of 10 micro-seconds or less. At this level, a few hundreds of nanoseconds network latency will significantly impact storage and application performance, especially when traversing multiple switches. Other Ethernet switches often produce latencies in the tens of microseconds in actual deployments, whereas Spectrum switches have ~300ns port-to-port latency and zero packet loss, regardless of frame sizes and speeds. Furthermore, Spectrum switches are designed with a shared buffer, resulting in maximum micro-burst absorption capacity. As shown in the diagram below, Spectrum switches are the only ESF switches supporting the fastest types of storage. Refer to Tolly Report for more details.

  • RoCE optimization. RoCE (RDMA over Converged Ethernet) is the only way to deliver close-to-local latency for fast storage and in-memory applications such as NVMe-oF, Microsoft Storage Spaces Direct with SMB 3.0, Spark, IBM Spectrum Scale (GPFS), etc. The optimized buffer design in Spectrum, combined with storage-aware QoS and faster Explicit Congestion Notification (ECN as supported in RoCE v2), delivers optimal congestion management for RoCE traffic. For example, Tencent achieved record-setting performance for big-data analytics with Spectrum. Deploying RoCE on Spectrum switches is simple, with three pre-defined profiles for buffer configuration. Refer to this Mellanox community post for more detailed information.
  • Automated network provisioning, monitoring and troubleshooting. Often times, performance, configuration and support issues in scale-out storage and HCI are network related. Zero-Touch Provisioning is provided through Ansible integration and Mellanox’s network orchestration and management software, NEO™. Ansible Playbooks and NEO not only improve operation efficiency but also eliminate network downtime caused by human errors. NEO provides network visibility, performance and health monitoring, plus alerts/notifications to storage/HCI administrators and guides them in troubleshooting. REST API-based, NEO can be easily integrated with scale-out storage or HCI software. For example, NEO is integrated with Nutanix AHV to provide automated VM-level network provisioning.
  • Cloud scale and future proof. Mellanox Spectrum TOR (leaf) and aggregation (spine) switches allow you to scale from half rack, full rack, multiple racks, to multiple datacenters. With rich L2/L3 features, including VXLAN/EVPN support for DCI (Data Center Interconnect), and NEO-driven network automation, Spectrum switches bring cloud-scale performance and manageability to scale-out storage and HCI. And by supporting all speeds including 10/25/40/50/100GbE, the same Spectrum ESF switches that you use today will continue servicing your needs when you migrate to next generation scale-out storage or HCI platforms that require higher speeds.

In a nutshell, Mellanox Spectrum switches make a perfect foundation for an ESF. Simple to deploy, easy to scale, and free of network bottlenecks, they allow scale-out storage and HCI to truly disaggregate data processing from data location, to achieve performance and scale. And Mellanox Spectrum switches deliver all these benefits in such an efficient way that you can spend less on networking, and more on your data and applications!

You can find more technical details about Mellanox Spectrum Switches in the Mellanox Community and on

Follow us on Twitter: @MellanoxTech.

Related Resources

Ethernet Storage Fabric – Part 1

The volume of data, in both structured and unstructured forms, is growing rapidly in datacenters. Applications that generate or consume this data are being developed and deployed across geographic regions. All these are calling for a storage infrastructure that is fast to expand with rapid data growth, agile to accommodate application performance requirements and virtualized infrastructure, and efficient to operate at scale. Traditional storage, exemplified by complex, expensive, and proprietary SAN-based storage systems, cannot meet these requirements. As a result, modern datacenters are breaking away from the “Big-Box” storage model, and migrating to scale-out, software-defined storage (SDS) and hyperconverged infrastructure (HCI), where it’s fast to deploy, elastic in scale, and flexible in provisioning.

Scale-out SDS and HCI build on industry standard servers, with a control plane allocating and managing resource pools on demand. Distributed and software-defined in nature, they are designed to deliver guaranteed performance to applications and services, expand on-demand to handle exponential data growth, and to simplify operations and management. By removing the bottlenecks and complexity in standard storage, scale-out storage and HCI lay the foundation for today’s cloud infrastructures – private, public or hybrid – to achieve ultimate cost and operational efficiency while meeting ever-increasing demands for performance and capacity. However, by adopting scale-out storage or HCI, you are only halfway through your data center transformation. The network fabric connecting scale-out storage and converged infrastructure also needs to be “modernized” before one can fully realize these benefits. Herein comes the reason you need an Ethernet Storage Fabric (ESF).

 What is an Ethernet Storage Fabric?

An Ethernet Storage Fabric, or ESF in short, is the fastest and most efficient way to network storage. It leverages the speed, flexibility, and cost efficiencies of Ethernet with the best switching hardware and software. It comes packaged in ideal form factors to provide performance, scalability, intelligence, high availability, and simplified management for storage.

An ESF is optimized for scale-out storage and HCI environments because it is designed to handle bursty storage traffic, route data with low latencies, provide predictable performance to maximize data delivery and allow for simplified scale-out storage architectures, and support storage aware services. These are all crucial attributes for today’s business-critical storage environments. In particular, the switches must support new, faster speeds including 25, 50, and 100GbE. They must have an intelligent buffer design that ensures fast, fair, consistent networking performance using any combination of ports, port speeds, and packet size.

Additional ESF attributes include support for not just block and file storage, but also for object based storage, along with storage connectivity for the newest NVMe over Fabric arrays. Additionally, an ESF must provide support for storage offloads, such as RDMA, to free CPU resources and increase performance. Not only is an ESF specifically optimized for storage, but it also provides better performance and value than traditional enterprise storage networks.

I’ve mentioned the need to support simplified scale-out designs, which will be expanded on

I’ve mentioned the need to support simplified scale-out designs, which will be expanded on in Part 2 of this blog.

The Benefits of an Ethernet Storage Fabric

ESF delivers the scale-out storage/HCI traffic in a faster, smarter and much simpler way.

Faster, guaranteed Performance. The ESF is a dedicated network fabric for scale-out storage and HCI. The congestion, increased latency, and unpredictable performance caused by traffic aggregation in the traditional three-tier network is now gone. Within the datacenter, any storage/HCI I/O transverses the ESF in a single hop if the end points are in the same rack, or in just three hops if across racks. As long as dedicated ESF switches are used to construct the fabric (we will come back to this point later on), storage and HCI traffic, including bursty I/Os, always reaches its destination with a predictable response time. With RDMA over Converged Ethernet (RoCE) offload and native NVMe over Fabrics (NVMe-oF) acceleration, applications are serviced at a highest performance level, in accordance with SLAs or predefined policies.

Simple to Deploy, Manage, and Scale. Ethernet is ubiquitously used in datacenters, and easy and rapid to expand. By converging all network and storage traffic within scale-out storage and HCI environments onto Ethernet, ESF eliminates network silos (such as Fibre Channel used with legacy SAN), resulting a single network fabric to manage. Beyond the boundary of a single datacenter, the use of overlay technologies such as VXLAN/EVPN which create efficiencies allowing expansion across multiple datacenters.

Automation, Security, and Storage-aware QoS. An ESF provides automated network provisioning, monitoring and management for virtualized workloads and storage traffic. Seamlessly integrated with clouds, ESF supports secure and isolated workspace for multiple tenants on scale-out storage and HCI. Combined with the intelligence in auto-discovering storage devices on the fabric, and allocating proper network resources for storage-aware QoS, the ESF delivers a non-disruptive and transparent network fabric for business continuity of business applications.

Cost-Effective. Ethernet is de-facto network in datacenters and clouds. Wide usage and high volume shipments have driven down the hardware cost, while ensuring rapid technology innovation and enterprise-class quality. Furthermore, innovative and scalable management tools and automation software for configuration, monitoring and troubleshooting have grown out of the huge Ethernet networks deployed by both enterprise and cloud customers. These management tools significantly reduce operational cost for managing scale-out storage and HCI. Easy application migration over a single fabric with automation tools maximizes uptime and resource utilization, also lowering operations costs.

Containers and Docker. The move to modern datacenters is driving new and dynamic operation models. An ESF must also provide a wide range of tools to address these needs. For example, support for Docker containers which enable software to be run in isolation. This provides faster and secure delivery of customized applications, giving customers a unique edge to quickly integrate and improve development cycles and share storage resources between containers.

Ethernet Storage Fabric vs. Fibre Channel

Naturally, as a seasoned IT professional, you may ask what has happened to Fibre Channel (FC)? Why is ESF, and not FC, the de-facto fabric in modern datacenters today?

My colleague, John Kim, bravely called out the demise of FC in his inspirational blog a couple years ago. As shown in the charts on the right, FC port shipments have continued their downward spiral, while Ethernet shipments have kept rising. Technology developments John listed in his blog have already taken place, including the arrival of mainstream 25/100GbE, the emergence of fast storage such as NVMe and 3D Xpoint SSDs, the growing adoption of object storage, and the convergence to public, private, and hybrid clouds.

Fibre Channel innovation has stagnated and it remains a block-only storage solution deployed only in the enterprise; it has no use in the cloud, big data, machine learning, or HCI. ESF is the only fabric adapted for these technological advances within modern and future datacenters and cloud deployments. And it does so at one third the cost of Fibre Channel at three times the performance.


An Ethernet Storage Fabric (ESF) leverages the speed, flexibility, and cost efficiencies of Ethernet to provide the foundation for the fastest and most efficient way of networking storage. The ESF has everything a traditional SAN offers but …faster, smarter, & much simpler.

An ESF is run on purpose-built switches which are optimized to deliver the highest levels of performance, lowest latencies and zero packet loss, with unique form factors and storage-aware features. Continue to read Part 2 of this blog, where we will discuss how to build an ESF network in the right way.

You can find more technical details about Mellanox Spectrum Switches in the Mellanox Community and on

Follow us on Twitter: @MellanoxTech

Related Resources

How Technology is Reshaping the Playing Field

A major annual football event is looming large on the screens and in the hearts of football fans everywhere. And while many are immersing themselves in the seemingly endless festivities, I find myself thinking of how technology has radically altered the playing field. Technologies have been transforming NFL over years but with accelerated technology development in broadcasting, high fidelity video playback, virtual or augmented reality and live data analytics, NFL games have been touched in every aspect. Fans are now moving to an era of augmented, 3D, 360-degree football experience.

Instant replay: Today, instant replay is reviewed by both referees on the football field and the officiating staff at NFL’s Art McNally GameDay Central (AMGC) in New York City who are hundreds or thousands of miles away, depending where the game is played. The replay videos from the best available angle from the telecast are compiled and ready for review when the referees arrive at the review booth.

Game play: Coaches now make or change their playbooks and game strategies more confidently and accurately, assisted by live analytics of their players’ performance and the plays devised by the opponent team. Meanwhile, 3D, 360-degree comprehensive game play is delivered to the big screen, on the field, and to television screens in fans living room via the NFL’s FreeD 360-degree replay system.

Fantasy Football: Analytics can now be fed to the game in real time. Rich stats also enable data mining used for scouting (remember “Moneyball”?) and to refine training. In addition, real time reporting and mobility means fans are just a click away from scores, stats and player news. And with digital cameras all over the field, fans can make that highly disputed call all on their own.

All the uncompressed, raw data streaming from cameras throughout the stadium (3K/4K/5K video capture) are fed live to the broadcasting truck on the field, then transmitted to the production studio, and finally broadcast to the review room of the AMGC in NYC for instant replay, and to televisions and portable devices for football fans worldwide. And game stats are produced and analytics, also live, are fed to audiences in the meantime


To cope with all these advancements on the field and in the view room, the broadcasting infrastructure is now undergoing a transformation to an IP-based, flexible, future-proof platform at large scale. Starting from packetizing the SDI payload of audio, video and ancillary data into IP, broadcasters are deploying real-time broadcast systems on today’s Internet infrastructure. However, challenges remain. The live, 3D, 360-degree video streams are extremely sensible to synchronous timing and high bandwidth, and very susceptible to signal delay and packet loss – this is where the Mellanox Ethernet solution shines with its lossless network, consistent high throughout and ultra-low latency, 100G today and 200G soon.

Game ON

Big data analytics, mobility, e-commerce, social media, and the cloud have all already had a huge impact on the games we watch and play. Technology innovation has not only radically altered the way we interact with sports but it continues to change the sports themselves. Of course, the only thing technologies cannot control is which two teams will show up on Feb. 4, 2018. That’s exactly the beauty of sports, isn’t it?

Supporting Resources:

4K and HDR Merge Tinsel town with IP Infrastructure

Market researchers tagged the 4K TV global market with 20% CAGR from 2017 to 2015. More and more customers are purchasing bigger screen TVs for finer detailed image, sharper and deeper color, and higher fidelity sound, aided by the availability of ultra high-definition media content and more affordability of 4K TVs; at the same time, the multitude of viewers and new money-making mechanisms (e.g., interactive advertising, streaming video on demand) drive both traditional broadcasters and internet services providers to vie for a larger market share by offering immersive experience and rich features over-the-air (OTA) or over-the-top (OTT). The trend is also evidenced by the standard ratifications with the latest in ATSC 3.0 (4K, HDR, 120 frames per second, wide color gamut, etc.) in November, 2017.

In the background, the technology enabling all these actually reaches far across the media and entertainment industry. From management and distribution of digital content, to development in post-production, the rapid evolution of the entertainment industry is being driven primarily by next-gen IP-based network solutions. It is not only OTT streaming services such as Netflix and Hulu that are tapping into the latest in data center technologies — we’re seeing this revolutionary new model drive innovations in the traditional broadcast production arena as well. Television networks, like BBC, Fox and NBC, are all looking to IP-based networking solutions, which power today’s data centers and clouds as a means to keep up with the latest trends for the industry. Particularly, data center technologies are enabling new video formats such as 4K uncompressed by helping to streamline the content management and distribution model across different platforms and adapt new methods for greater efficiencies, performance and cost-savings. These sharper, more vibrant digital formats are pushing data volumes and workloads to quadruple now and more as the industry heads toward 8K video.

Broadcast and production companies must evolve their networks to new, scalable IP-based infrastructure as demands on existing proprietary SDI routers, coaxial cables and BNC connectors have been pushed to their limits. These dated technologies can’t support the rapid progress in video and audio quality or emerging distribution models, which do away with proprietary technologies on the consumer end altogether. Working with the Joint Task Force on Networked Media (JT-NM), Advanced Media Workflow Association (AMWA) and Society for Motion Picture and Television Engineers (SMPTE), Mellanox has helped to define standards that are shaping the next-gen, end-to-end IP studio via solutions that include Spectrum switches, ConnectX-4® network adapters and LinkX cables that can meet and scale to ever-increasing demands from content in the digital age.

In tests with Fox Networks, Mellanox Spectrum switches were shown to have the lowest port-to-port latency and packet delay variation in the industry, providing a fabric that’s both reliable and scalable to meet today’s and tomorrow’s demands.

Software-defined architecture can provide greater efficiencies for a broadcasting network. Utilizing OpenFlow over the Studio Control System, a Software Defined Network model can be configured to manage switches and prepare a network for the desired workflow and video routing. By supporting OpenFlow 1.3 with 6,000 ACL based flows and flexible pipeling, Mellanox Spectrum switches present the best-of-breed OpenFlow solution. Furthermore, we can now containerize IP studio services to run directly on Mellanox Spectrum switches, providing an IP media fabric that doesn’t require utilization of additional servers and virtual machines — meaning greater performance and efficiencies.

As video processing is extremely CPU-intensive and strictly sequential, intelligent adapters can unlock cost-savings and faster processing at various stages of a studio’s content, from development through to digital distribution. Capabilities like Kernel Bypass, via a solution like Mellanox’s ConnectX line of adapters, offload workloads from CPUs to reduce their overhead for packet-processing. Kernel bypass technologies such as RDMA, Netmap, Data Plane Development Kit (DPDK) and Mellanox VMA all work to lower jitter and increase throughput, maximizing CPU performance by letting these resources focus on the most critical tasks.

Another risk faced when managing large videos is the potential for congestion over switch ports as demands can spike quickly and exhaust the switch buffer. Techniques like packet pacing, which address both switches and the server, overcome this challenge by rate-limiting flows while preventing packet loss in the process.

Last but not least important, the performance of cables can mean smooth flow of video throughout a network or a massive bottleneck. Solutions such as Mellanox’s LinkX cables offer the ideal solution for IP-based studios, providing a fabric with a high degree of performance and accuracy. Utilizing passive copper and active fibre (VCSEL Silicon Photonics) cables along with optical transceivers, LinkX cables offer industry-leading performance and reliability with power- and overall cost-savings in mind.

The emergence of digital formats for video entertainment presented a game changer for the entertainment industry, providing a completely new paradigm for how content can be created, managed and consumed. With the rapid development of the on-demand/streaming model and constant development of ever-richer ways to view and experience content across different platforms, studios now need to be several steps ahead of technologies just to keep up.

Migrating to an IP-based infrastructure empowers broadcasters to innovate in all the areas of content creation and distribution, multi-platform support and future video formats. With the emergence of open networking and commercial off the shelf (COTS) solutions, the same technology that powers today’s datacenters and clouds provides the underlying foundation for innovations driving the next generation of video entertainment.

 Supporting Resources:


Mellanox SN2010 – the Best Hyperconverged Infrastructure Switch

Mellanox’s half-width SN2010 Top-of-Rack (TOR) switch is the best switch for storage and hyperconverged networks. The latest addition to our Spectrum switch family, it is designed for 10/25GbE storage/hyperconverged server clusters with 100GbE uplink connectivity to higher-speed networks. Carrying the signature performance supremacy of Mellanox Spectrum, this 10/25GbE TOR switch provides the ideal combination of performance, rack efficiency, and flexibility to today’s software-defined storage and hyperconverged infrastructure, and presents an easy migration path to next-gen networking.

The migration to software-defined storage (SDS) and hyperconverged infrastructure (HCI) is moving into the mainstream, as the technologies have matured and the adopters ‒ from cloud services providers to small and large enterprises ‒ have benefited from the scalable performance and efficiency as well as the simplicity of deployment and management. As these users became more experienced, they realized that using the right networking to interconnect their SDS and HCI clusters is critical to reap the promised benefits in the most efficient way possible. Not only are they cognizant of using a dedicated network, with predictable and guaranteed performance, to handle storage and HCI data flows, but they have also discerned that using the right switches in the network fabric can significantly impact the efficiency of their storage and hyperconverged infrastructures. The right switches can also simplify and accelerate the migration path of their data centers.

Many users have used SDS and HCI to consolidate their legacy data center silos into a modern data center consisting of racks of server nodes (each node being compute, storage or hyperconverged). With SDS and HCI, existing x86 servers, often equiped with 10GbE network interfaces, continue being used as part of the infrastructure (a great efficiency story in itself by the way). As such, racks with 4-16 x86 server nodes — each with 10GbE links to the network — are commonly seen today.

Before Mellanox introduced its unique half-width TOR switches, the SX1012 and the SN2100, customers had to install a 48+4-port switch for the TOR, with more than half of the switch ports unutilized. To make the situation worse, if TOR switch redundancy was required, which was the case for most enterprise deployments, then the under-utilization was exacerbated, not to mention the waste in rack space, power consumption, and cooling. For example, two typical old-style switches would have 96 10GbE ports + 8 uplink ports, but a typical SDS or HCI cluster  of 16 servers would only use 32, or one third, of those 10GbE ports. In deployments with confined space and stringent power and airflow requirements, the legacy TOR switches simply turned out to be unusable.

With the half-width, 1RU Mellanox switches, the user can put two units side-by-side in a 1RU rack space, achieving needed network connectivities and high availability (HA). The same 16-server cluster would achieve HA using 32 out of 36 ports (of two SN2010s), or near-perfect sizing of the switch to the deployment. The following simple comparison is done with a 2U storage or HCI deployment. The rack efficiency achieved is significant.

Mellanox Ethernet switches are known for their superior performance for data flows in various applications, such as NVMe over fabric, fast replication, and fair bandwidth allocation. As both SDS and HCI run I/O intensive applications using all-flash configurations, high-performance ToR switches become more important. Interested readers can refer to the Tolly Report for more details. Mellanox also makes network orchestration and automation much easier with its end-to-end network management software, NEO™. With RESTful APIs, NEO is extremely simple to be integrated with the SDS or HCI management software and enrich network visibility and manageability. A couple examples are NEO integration with OpenStack Neutron and with Nutanix Prism.


Heeding the needs of SDS and HCI users with 10GbE server connections, Mellanox recently announced the availability of its new half-width switch SN2010, built on Spectrum advantages exclusively for 10GbE SDS and HCI use cases. The SN2010 contains 18x 10/25GbE SFP+ ports for the server connectivity and 4x 40/100GbE QSFP28 ports for uplinks to the main network. In the half-width, 1RU form factor, SN2010 consumes 57 Watts typical (ATIS). SN2010 is future proofed by supporting 25GbE on the same 18 SFP+ ports. When the user is ready to migrate to 25GbE servers, which is already on the horizon, the same SN2010 switches remain viable. Lastly, SN2010 supports both MLNX-OS and Cumulus Linux.

For a storage or HCI rack with up to 18 servers, the SN2010 TOR switches are the perfect fit for 10GbE today and 25GbE in the near future. And for Mellanox, we always believe in “LESS is MORE” as in Less power consumption, Lower latency, Smaller footprint, and Lower price are all ways to bring More value to customers.

An anecdote – if the user does not use all the 18 10GbE ports, he or she can utilize those spare ports to connect 1GbE management interfaces on the servers with the 1GbE transceiver from Mellanox. In this case, extra savings will be achieved by eliminating the need for a dedicated 1GbE management switch. See the chart below for all the connectivity options.


Our SDS and HCI partners are obviously taking the note. Here are some of their comments:

“The new 10G optimized Mellanox SN2010 TOR switch completes the Mellanox Spectrum switch line as the efficient and flexible network fabric for modern storage systems,” said Marty Lans senior director & general manager, Storage Connectivity and Interoperability, HPE. “A great follow-on to the SN2100 switches that connects the current 10GbE networked storage to next-gen networks.”

“Nutanix has a distinct vision to transform organizations’ datacenter operations with an Enterprise Cloud OS that reduces hardware footprint and improves scalability and TCO,” said Venugopal Pai, Vice President of Strategic Alliances and Business Development, Nutanix. “The new space-efficient SN2010 further complements the joint solutions our teams will provide for customers together and strengthens our partnership with Mellanox.”

It’s clear that the rapidly growing market for SDS and HCI have created the need for 10/25GbE Ethernet switches optimized for making these deployments both high performance and efficient. The new Mellanox SN2010 meets these needs, making it the perfect switch for hyperconverged infrastructure and the software-defined storage clusters connecting at 10GbE or 25GbE.

Supporting Resources:

Ignite Your Microsoft Software-Defined Datacenter with DataON and Mellanox

DataON TracSystem, certified for Microsoft Windows Server, is a fully integrated and turnkey hyper-converged solution with Windows Server 2016 Storage Spaces Direct. Powered by the end-to-end Mellanox RDMA over Converged Ethernet (RoCE) solution, and all-flash NVMe SSDs, DataON TracSystem achieves exceptional performance of 3M IOPS in a four-node configuration. It also delivers simplicity, scalability, automation and affordability to the vast number of enterprise customers who have made the “Microsoft choice”.

For enterprises that have built their IT infrastructure with Microsoft solutions (Windows Servers, Microsoft business applications, etc.), there has been a long and weary wait for a data center solution that allows them to achieve the Microsoft Azure Cloud-like agility and efficiency for their on-premise data centers. Now, the wait is over. DataON, a key partner in the Microsoft WSSD program, has made available the latest Windows Server 2016 with Storage Spaces Direct on their hyper-converged (HCI) platform, TracSystem, to these customers. DataON TracSystem delivers on the promise of performance, scalability and simplicity of software-defined datacenters. The same benefits provided by services from large public clouds are now available for consumption on premise.

Built on Windows Server 2016 and Storage Spaces Direct which are also the building blocks of Microsoft Azure Cloud, the DataON HCI solution brings virtualization (of compute, storage and networking), automation and security to enterprise data centers. The resulting mobility and scalability enable customers to accommodate their growing business needs in a pay-as-you-grow fashion, while maintaining existing applications without compromise in security. Substantial efficiencies are achieved by cost reductions as they use less expensive hardware, and operation automation in policy-based provisioning and orchestrating. With Storage Spaces Direct, DataON TracSystem provides a tiered storage that is flexible enough to meet performance, capacity and budget requirements. For customers concerned about running their business critical applications on a software-defined platform, TracSystem, with the RDMA support in SMB3, delivers breakthrough performance for most performance-demanding business applications.

Comprised of DataON S2D Server Ready Nodes which are field-proven with over 600 customers and over 100PB deployed, DataON TracSystem is performance-tuned, and delivers incremental compute, networking and storage resources while providing linear scalability on demand.

  • 3M IOPS for a 4-node cluster with al NVMe SSDs
  • 40+ Hyper-V VMs per node and up to 16 nodes per cluster

TracSystem is also simple-to-deploy and self-serviced with DataON MUST as the monitoring and management tool. First to market for Windows Server 2016 deployments, and fully integrated with the Windows Storage Health Service API (SM-API), DataON MUST provides a single-pane-of-glass view of your WSSD datacenter for provisioning, monitoring, and troubleshooting.


Mellanox 100GbE End-to-End RoCE Solution

The RDMA network fabric in DataON HCI TracSystem is provided with the Mellanox 10/25/40/50/100GbE end-to-end RoCE solution. The end-to-end Mellanox networking delivers high bandwidth and low latency that unlocks the power of fast flash storage and accelerates I/O intensive applications.

In the network fabric, the Mellanox Spectrum switches provide non-blocking switching with no packet loss. On the server, the Mellanox ConnectX®-4 network adapter cards offload RDMA and network virtualization functions from the CPU. Combined advanced congestion management, Mellanox networking enables the industry’s most reliable and low-latency SMB3 RDMA fabric, delivering two times the throughput compared to TCP/IP, less than 1 µsec latency from VM-to-VM communication, and fewer CPU cycles per I/O with better core utilization.


Join Us at Microsoft Ignite!

DataON will demonstrate their latest TracSystem Lightning platform 5224L, with Mellanox 100GbE end-to-end RoCE, at booth #1726 at Microsoft Ignite (Sept. 25-29 in Orlando, Florida). Find more about the latest and complete offering from DataON at this sneak preview.


Additional Resources:



Automated Network Provisioning for VMs with Mellanox and Nutanix

Applications in enterprise clouds are virtualized, running from virtual machines (VMs) or containers sprung from physical servers.

This allows cloud applications to utilize the most optimal resources available, use them only when needed, and share resources to achieve the best efficiency. For instance, compute-intensive applications run on VMs residing on servers equipped with powerful CPUs and lots of memory. Storage-heavy applications run on VMs with lots of local storage; and when resource availability or needs change, applications on VMs are migrated live to a different host, with no downtime or disruption.

In parallel, enterprises are employing the hyper-converged infrastructure for their clouds. The hyper-converged infrastructure natively converges storage and compute into standard x86 servers. These x86 servers, containing local direct-attached storage, are clustered into a software-defined platform that allocates resources to VMs running on these servers in a most efficient way.

Such application mobility, scalability and availability in your enterprise cloud over a hyper-converged platform must be supported by a network infrastructure that is high performance, easy to scale and highly available. With all these elements (i.e., the best hyper-converged infrastructure and a best network of these great qualities), you have now built your enterprise cloud.

But you are not done yet.

Enterprise Clouds Should be Automated.

As the saying goes, time is money. Don’t you want to complete building your cloud in hours, rather than days or weeks? Deploy an application on a VM with one mouse click? Or migrate an application transparently from a failing node to a good node?

Requiring faster deployment times with no tolerance for business disruption means that doing manual network reconfiguration is not just costly, but will fail your enterprise cloud and hyper-converged infrastructure badly. And this doesn’t even take into account the hundreds or thousands of applications and VMs you need to manage…

Your cloud is incomplete until you build in networking automation – automated provisioning, automated management, automated recovery, etc.

For the remaining of this blog, I will show you how Mellanox NEO™ network management software works seamlessly with Nutanix Prism™ infrastructure management software to provide you with VM-level network visibility and automated network provisioning. So, with just one click, everything simply works when you spin up, migrate or retire a VM for your application.

NEO + Prism = Automated Network Provisioning for VMs

Nutanix Prism is a centralized infrastructure management solution for virtualized datacenter environments. It brings unprecedented simplicity by managing the entire stack from the storage and compute infrastructure all the way up to virtual machines (VMs). Key features of Prism include storage management, VM management, network virtualization, and virtual network management.

Mellanox NEO is a powerful platform, designed to simplify network provisioning, monitoring and operations of the modern data center. NEO offers robust automation capabilities from network staging and bring-up, to day-to-day operations.

To complement the advanced features of Nutanix Prism for running virtual workloads, Mellanox NEO adds another layer of seamless orchestration and management for the underlying network fabric.

Through deep API–to-API integration, NEO is subscribed to Prism’s event notifications and receives real-time events notifications upon VM creation, migration and deletion. Every time a new VM is spun up through the Prism console, NEO is alerted and automates the creation of the corresponding network on the physical switch where the new VM is provisioned. The same automation capabilities also apply to any changes, migration, and deletion of existing VM workloads. Furthermore, NEO adds the capability to visualize the networking fabric at the VM-level.

Automated Network Provisioning for VM Creation

The following NEO screen captures illustrate how NEO automates network provisioning when a VM is created in Prism.

Automated Network Provisioning

Figure 1. NEO displays the network map of four Nutanix nodes connected through a Mellanox SN2100 switch.



Enterprise Cloud Solutions

Figure 2. A new VM “colo_nj_web01” is created on “NTNX-Block-1-D” in Prism.




Figure 3. NEO automatically configures the VLAN for the newly created VM, upon the notification from Prism.


Without this step, the network administrator would have executed the following command at the CLI console of the switch:

Interface Ethernet 1/1/3 switchport hybrid allowed-vlan add 30


Figure 4. The information about the newly created VM “colo_nj_web01” is automatically displayed under device “NTNX-Block-1-D”.


Similarly, NEO automates the network configuration changes when a VM is migrated from one Nutanix node to another. Watch more on NEO network automation for VMs in this YouTube video.

In summary, the Prism and NEO integration automates network provisioning tasks, and eliminates costly and time-consuming manual operations. As the result, the VM or the business application is always on the right part of the infrastructure with no disruption. Mellanox NEO supports this network automation capability for Mellanox switches running the Mellanox network operating system as well as those running the Cumulus Linux network operating system. Leveraging Nutanix APIs to create this added level of visibility and business continuity presents huge benefits to Nutanix enterprise cloud customers.

We are demonstrating this great utility at Mellanox booth (#S6) at the Nutanix .NEXT User Conference in Washington D.C. this June 28-30. Come to visit us and discover how to make your Nutanix enterprise cloud simple and efficient with Mellanox networking.

Supporting Materials:


Nutanix Enterprise Cloud for Your Business Critical Applications

With Guaranteed Performance, Continuous Availability, and Automated Services from Mellanox Networking

The Nutanix Enterprise Cloud Platform is accelerating enterprise’s adopting cloud into their IT infrastructure. This cloud transformation of the workplace promises compelling benefits: agile application/service delivery, simplified and scalable IT infrastructure, automated management, and pay-as-you-grow, all of which ultimately improves efficiency and significantly reduces costs, both CapEx and OpEx.

The Dilemma of Enterprise Cloud

For all these benefits, most enterprise customers have embraced and migrated many of their applications to a cloud infrastructure, including virtual desktop interfaces (VDI), web services, email servers, and remote/branch office IT. However, business critical applications, which tend be performance sensitive—especially around latency—and require 24×7 non-disruptive uptime, have been kept out of the cloud. IT departments worry that running business-critical applications on a cluster of virtualized, distributed, and software-defined servers in the cloud cannot deliver the needed level of performance, reliability and availability. They thus run business critical applications on dedicated, bare-metal servers.

A common example are OLTP workloads on Oracle RAC, which are very latency sensitive. Transactions always need be completed as stipulated, at any time and at any transaction volume. (In 2016, Alibaba hit over 1 billion transactions on the Singles’ day, or over 100,000 transactions per second through its Wallet app). At the same time, Oracle RAC, based on clustered servers with shared storage, is very susceptible to the latency overhead of ownership transfer between nodes during write transactions. These ownership transfers traverse the cluster network, so with many instances running over a large cluster, any extra network latency could significantly degrade the overall application performance.

How To Cloudify Business Critical Applications

Is there such a solution that provides the enterprise cloud benefits of scalability, flexibility, and efficiency, while also meeting the performance, reliability and availability guarantees required for business critical applications?

The Nutanix enterprise cloud solution, with Mellanox networking, is exactly the answer to that.

Nutanix Enterprise Cloud Solutoins

Nutanix Enterprise Cloud Solutions Figure 1. Nutanix Enterprise Cloud Platform with Mellanox Leaf-Spine Network

Built on its hyper-converged architecture, the Nutanix solution converges the entire enterprise data center into a fully integrated enterprise cloud platform, replacing the legacy infrastructure of separate compute servers, storage arrays, and storage network. This Nutanix enterprise cloud platform consists of Acropolis™ storage and virtualization services, Prism™ data center automation and analytics, and Mellanox network switches. The Acropolis-based data plane natively converges compute and storage resources. With capabilities of data locality, intelligent tiering, automatic disk balancing and data reduction, Acropolis provides performance acceleration and capacity optimization for storage services and virtualization services especially when run from the native Acropolis hypervisor (AHV).

While the Nutanix Acropolis, which includes high availability, data protection and security capabilities, provides enterprise-class performance, availability and reliability for storage and virtualization services, the Mellanox switch fabric delivers guaranteed low-latency network performance for the Nutanix cluster. This is critical for running business critical applications.

The requirement for the network in this case is a simple but tall order to fill – providing sufficient network bandwidth and negligible additional latency to any data transfers between the nodes. In other words, the network needs be so reliable, fast and consistent that it appears transparent for compute, storage and virtualization services. This ensures that access times to local data or data distributed to another part of the cluster are not discernably different, and that moving an application from one node to the other causes no disruption to the business tasks.  And such network transparency needs be maintained regardless of workloads and data volumes. It is very challenging to support such a network, but Mellanox simply delivers the best in transparent networking.

Leveraging its technologies proven and used in the demanding world of high-performance computing (HPC), Mellanox blends its core competence of non-blocking switching, zero packet loss, and consistently low latency, with the enhancement of a larger and dynamically shared buffer as found in the state-of-art Spectrum switch line. These Spectrum SN2000 switches provide line-rate throughput and consistent 300ns port-to-port latency at network speeds of 10Gb/s up to 100Gb/s and using any packet sizes. Combined with fair traffic distribution, optimal microburst absorption and smart congestion management, the Mellanox switches make the network transparent to even the most stringent business critical applications. The consistently low latency of the Spectrum switches is illustrated in the chart below. The full Tolly report is available:

Nutanix Enterprise Cloud Solutions

Figure 2. Consistently low latency of Mellanox Spectrum switches


As illustrated in Figure 1, the Nutanix Enterprise Cloud Platform uses the leaf-spine architecture based on Mellanox 100 GbE Spectrum switches to achieve predictable, low latency at any packet size, with linear scalability and lower management overhead than traditional three-tier network infrastructures. High throughput and scalability of the Mellanox switches enable the network to supply sufficient bandwidth for the high-performance NVMe-based SSDs to perform optimally without impacting other applications or services. With performance that actually exceeds bare-metal solutions based on specialized proprietary hardware, the Nutanix and Mellanox solution allows businesses to adhere to stricter service-level agreements, achieve greater responsiveness, and deliver an improved user experience for their business critical applications.

Obviously, Spectrum switches provide more than line-rate throughput and low latency. Spectrum switches make great TOR switches or leaf-spine switches for Nutanix enterprise cloud deployments, small and large. Below are a few more relevant blogs and solution briefs for interested readers:

And more technical content about rack solution design using Mellanox Spectrum Switches is available on the Mellanox and Nutanix websites.

Mellanox is sponsoring the upcoming Nutanix .NEXT User Conference in Washington D.C. this June 28-30. Come to visit us (Booth #S6), and discover how to make your Nutanix enterprise cloud simple and efficient with Mellanox networking.

Follow us on Twitter: @MellanoxTech.

Networking Your Nutanix Enterprise Cloud To Scale

Leaf-Spine Architecture with Mellanox Networking Builds Scalable and Efficient Infrastructure

Your enterprise cloud on the hyper-converged platform is built to scale. As you grow your business with more customers and new services, your enterprise cloud has to meet your business needs for both today and the future. Can your current network infrastructure also scale efficiently to accommodate future business needs? Keep in mind that it’s always more expensive to change when you have a fully operational network already in place.

There is a good chance that your current network is built on a Three-Tier Architecture. It is fairly simple to physically expand your network when applications are running on dedicated physical servers.

1The three-tier architecture consists of the access layer where servers are connected, the aggregation layer where the access switches are connected upstream, and the core layer that connects everything. When more servers are connected to the access layer, you add access switches to physically expand the switch ports at L2 if needed. This is quite straightforward – all you need to do is to calculate the switch ports required and check the rate of over-subscription to the upstream network for sufficient bandwidth.

Much of the data in this framework is processed and remains in the dedicated domain (L2 segment). When a service in one physical domain needs to reach another domain, then the traffic often flows north-south. For example, the request from the webserver goes upstream to the aggregation and core layers and then travels down to the database server in another physical L2 segment. The response data traverses through three layers in the same fashion. But this network topology cannot cope with the scalability and performance of hyper-converged infrastructure at modern data centers.

With hyper-converged infrastructure, a cluster of x86 servers are “glued” by a software control plane to form unified compute and storage pools. All applications are virtualized to run on a virtual machine (VM), or a container, and distributed (and migrated) across the cluster on policy-based automation. Application I/Os are managed at the VM level, but physical data is distributed across the cluster in a single storage pool.

Access to the shared storage, data protection mechanism (replication, backup, and recovery), and VM migration for load balancing now generates a deluge of network traffic between the nodes in the cluster, or so-called east-west traffic.

Now, the three-tier architecture reaches its limit and breaks down.

For the traffic switched within the L2 segment, the commonly used spanning-tree protocol (STP) takes its toll because disabling redundant links to cut the loop results in severe link capability under-utilization. Adding link capacity to accommodate the east-west traffic is quite expansive and is saddled with low efficiency.

For a large cluster that spans over multiple racks and L2 segments, the traffic has to go through the aggregation and core layers which results in increased latency. This large amount of upstream traffic leads to higher rate of oversubscription from the access layer to the aggregation and core layers which will inevitably cause congestion and degraded, unpredictable performance.

For storage I/Os, degraded and unpredictable performance presents the worst scenario possible.

Because of these architectural shortcomings, modern data centers are adopting the leaf-spine architecture instead. Constructed in two leaf (access) and spine layers, the leaf-spine architecture has a simple topology wherein every leaf switch is directly connected to every spine switch.

2In this topology, any pair of end points communicates with each other in a single hop, as this ensures consistent and predictable latency. By using OSPF or BGP with ECMP, your network utilizes all available links, and achieves maximal link capacity utilization. Furthermore, adding more links between each leaf and its spine can provide additional bandwidth between leaf switches.

In addition, the use of overlay technologies such as VXLAN can further increase efficiency. As a result, the leaf-spine architecture also delivers optimal and predictable network performance for hyper-converged infrastructure.

In a nutshell, the leaf-spine architecture provides maximal link capacity utilization, optimal and predictable performance and the best scalability possible to accommodate dynamic, agile data movement between nodes on hyper-converged infrastructure. For this reason, it is only fitting that the leaf-spine network is constructed with Mellanox Spectrum™ switches which provide line-rate, resilient network performance and enable a high-density, scalable rack design.

3Mellanox Spectrum switches deliver non-blocking line-rate performance at link speeds from 10Gb/s to 100Gb/s at any frame size. In particular, the 16-port SN2100 Spectrum switch offers most versatile TOR switch in a half-width, 1RU form factor.

The 16 ports on SN2100 can run speeds at 10, 25, 40, 50 and 100Gb/s. When more switch ports are needed, you can expand a single physical port into four, 10 or 25Gb/s ports using breakout cables. Therefore, SN2100 can be configured as 16-port 10G or 25Gb/s switch or 48-port 10/25Gb/s switch with four 40/100Gb/s ports for uplinks.

The half-width form factor of SN2100 allows you to install two of them side-by-side in a 1RU space on the rack, and run MLAG (Multi-chassis Link Aggregation Groups) between them to creates a highly available L2 fabric. Configuring link aggregation between physical switch ports and hyper-converged appliances utilizes all physical network connections to actively load balance VMs  ̶  a key advantage particularly in all-flash clusters.

It’s also worth pointing out that 100Gb/s uplinks available on Spectrum switches offer more link capacity between leaf and spine switches, which is very useful with all-flash-based platforms.

More details are illustrated in the recently published solution note by Nutanix. As the leading enterprise cloud solution provider, Nutanix sees more and more customers migrate their data centers to Nutanix hyper-converged platforms, from SMBs with a half-rack deployment to large enterprise customers whose cloud spans across multiple racks. Customers are consolidating more intensive workloads to their clouds and starting to use faster flash storage. For these Nutanix-based enterprise cloud deployments, “Designing and implementing a resilient and scalable network architecture ensures consistent performance and availability when scaling.”

Mellanox switches allow you to create a network fabric that offers predictable, low-latency switching while achieving maximum throughput and linear scalability,” noted Krishna Kattumadam, Sr. Director Solutions and Performance Engineering at Nutanix.

“Investing in, and deploying a Mellanox solution, future-proofs your network, ensures that it can support advances in network interface cards beyond the scope of 10 GbE NICs (to 25, 40, 50, or 100 GbE and beyond),” continued Krishna Kattumadam. “Coupled with a software-defined networking solution, Mellanox network switches offer such benefits as manageability, scalability, performance, and security, while delivering a unified network architecture with lower OpEx.”

If you are architecting your network for a Nutanix enterprise cloud, the Nutanix solution note presents solutions that can help you achieve scale and density with Mellanox networking. I will leave much for you to read, and would like to conclude this blog with the following network diagrams. As shown, SN2100s fit in nicely with the right port count for half-rack deployment of 4-12 nodes, typical of SMBs. When the data center grows and more server nodes are added, the same SN2100 switches can also support a full-rack deployment up to 24 nodes. For large enterprise cloud deployments consisting of multiple racks, Mellanox Spectrum switches can scale easily in a spine-leaf topology with great efficiency.


You can find more technical details about rack solution design using Mellanox Spectrum Switches in the Mellanox Community and on

Follow us on Twitter: @MellanoxTech