All posts by Karthik Mandakolathur

About Karthik Mandakolathur

Karthik is a Senior Director of Product Marketing at Mellanox. Karthik has been in the networking industry for over 15 years. Before joining Mellanox, he held product management and engineering positions at Cisco, Broadcom and Brocade. He holds multiple U.S. patents in the area of high performance switching architectures. He earned an MBA from The Wharton School, MSEE from Stanford and BSEE from Indian Institute of Technology.

Introducing Test, a five star deep learning infrastructure program from Trace3, NetApp, Mellanox, Nvidia and Fletential.

Test drive your five-star deep learning infrastructure today!

When we search for a hotel in an unfamiliar place, we tend to decide based on the rating. A five-star rating would assure us of a certain level of security, comfort, and quality. In the same way, as businesses adopt new technologies such as deep learning infrastructure, a five-star infrastructure would assure a certain level of performance, quality, and convenience.

While the benefits of deep learning are obvious, selecting the right components, configuring the infrastructure for peak performance and provisioning the power/space can be a daunting task for first-time adopters. Five industry leaders – Flexential, Nvidia, Mellanox, NetApp, and Trace3 are teaming up to launch the Test drive program. The goal of the Test drive program is to help customers benefit from deep learning without having to worry about data privacy or fine-tuning the infrastructure.

How Does the Test Drive Program Work?

Flexential colocation data centers hosts ready for use racks comprising of Nvidia DGX-1/DGX-2 systems and NetApp AFF A800 all-flash storage system interconnected by Mellanox Spectrum SN2700 Ethernet switches. The infrastructure is pre-configured and fine-tuned for peak performance by Trace3. Customers can securely connect to the infrastructure and use popular machine learning frameworks in a familiar environment such as Jupyter notebooks right away!

The Infrastructure

The core infrastructure is based on the recently published NetApp ONTAP AI reference architecture with Mellanox Spectrum Ethernet switches.

Spectrum Ethernet switches are optimized for storage and deep learning. Spectrum switches:

  • Provide robust, low latency and high bandwidth RoCE based data path
  • Boost distributed system performance by sharing the bandwidth fairly and by implementing smart end-end hardware-accelerated congestion management mechanisms.
  • Dramatically reduce mean time to issue resolution and simplify operations with support for advanced telemetry technology.

The Bottom Line

Flexential, Mellanox, Nvidia, NetApp, and Trace3 have teamed up to launch the Test drive program. Test drive program provides customers with best of breed, high performance and professionally run infrastructure. With Test drive, customers can just focus their deep learning workloads without having to worry about purchasing, racking, stacking and tuning the infrastructure.

Additional Resources


Deep Learning Infrastructure built right

In any good system design, it is important to maximize the performance of the most critical (and often the most expensive) component. In the case of Deep Learning infrastructures, the performance of the specialized compute elements such as GPUs must be maximized.

GPUs have improved compute performance 300-fold! Even so, deep learning training workloads are so resource-intensive that they need to be distributed and scaled out across multiple GPUs.  In such a distributed environment The Network is the critical part of the infrastructure that determines the overall system performance. Using legacy networks for Deep Learning workloads is like trying to drive a race car through a traffic jam. A race car requires a highly tuned, banked race track to run at top speed!

What attributes make an interconnect ideal for Deep Learning Infrastructure?

Consistent Performance

Distributed Deep Learning workloads are characterized by a huge thirst for data and the need to communicate intermediate results between all nodes on a regular basis to keep the applications from stalling. As such, significant performance gains are possible using high bandwidth, hardware accelerated RDMA over Converged Ethernet(RoCE) based GPU-GPU communications that support broadcast, scatter, gather, reduce, and all-to-all patterns. GPUs also need to read and process enormous volumes of training data from storage endpoints. The interconnect fabric that glues the distributed system together should reliably and quickly transport the communication packets between the GPUs and between GPUs and storage.

Mellanox Spectrum Ethernet switches deliver high bandwidth and consistent performance with:

  • Fully-shared Monolithic Packet Buffer that provides better traffic burst absorption capabilities, without dropping packets
  • Consistent cut-through performance and low latency
  • Intelligent Explicit Congestion Notification (ECN) based congestion management control loop to regulate traffic at a granular flow level to mitigate congestion.

Commodity off-shelf merchant silicon-based switches use fragmented packet buffers that are made of small packet buffer slices that are unable to absorb high bandwidth traffic bursts. The congestion management mechanisms are broken in switches that have these fragmented buffers. Additionally, without a tight ECN congestion management mechanism, such switches aggravate congestion by sending pause frames prematurely and blocking the network. With an unregulated flow of traffic and packet drops, commodity switches are unable to deliver the consistent low latencies required to maximize the Deep Learning cluster performance.

Intelligent Load Balancing

Distributed Deep Learning systems should be well balanced to bring forth best in class scale-out performance. Leaf-Spine networks leverage Layer-3 Equal Cost Multi-Path (ECMP) to balance and deliver high cross-sectional bandwidth necessary for scaling out. Mellanox Spectrum Ethernet switches enable high cross-sectional bandwidth:

  • Spectrum-based switches utilize their high-performance Packet buffer architecture to share available switch bandwidth fairly across ports
  • Spectrum implements flexible packet header hashing algorithms that enable it to evenly distribute traffic flows across Layer-3 Equal Cost Multi Paths

Commodity off-shelf merchant silicon-based switches have fairness issues that can result in traffic imbalance. For example, in a simple 3:1 oversubscription test with three senders sending traffic to the same destination, one of the senders often hogs 50% of the bandwidth, leaving each of the other nodes with only ~17%. These performance variations caused by traffic imbalance, in turn, can deteriorate overall distributed system performance.

Comprehensive Visibility for Deep Learning Infrastructure

It is critical to keep Deep Learning Infrastructure up and running to get the most out of it. Having native and built-in telemetry in the interconnect will also help with capacity planning and improve resource utilization.

With Mellanox What Just Happened™ (WJH), network operators can dramatically improve mean time to issue resolution and increase uptime. Mellanox Spectrum Ethernet switches provide rich contextual and event-based telemetry data that can help quickly drill down into application performance issues. With Mellanox WJH, operators can monitor infrastructure utilization, remove performance bottlenecks and plan resource capacity.

Commodity off-shelf merchant silicon-based switches are not designed to provide granular network visibility. As a result, networks operators are forced to collect data centrally and apply predictive methods to only guess the root-cause of issues. This creates a centralized choke point and such solutions cannot efficiently scale to support 25/100GbE speeds.

The Bottom Line

The network is the critical element that unleashes the power of specialized Deep Learning infrastructure. Mellanox Spectrum Ethernet switches with consistent performance, intelligent load balancing, and comprehensive visibility is the ideal interconnect of choice for Deep Learning applications. Use Mellanox Spectrum Ethernet switches to build your Deep Learning Infrastructure.

Additional Resources:

Mellanox Debuts in the 2018 Gartner Magic Quadrant for Datacenter Networking

In the Q1 2018 earnings call, Mellanox reported that its Ethernet switch product line revenue more than doubled year over year. Mellanox Spectrum Ethernet switches are getting strong traction in the data center market. The recent inclusion in the Gartner Magic Quadrant is yet another milestone. There are a few underlying market trends that is driving this strong adoption.

Network Disaggregation has gone mainstream

Mellanox Spectrum switches are based off its highly differentiated homegrown silicon technology. Mellanox disaggregates Ethernet switches by investing heavily in open source technology, software and partnerships. In fact, Mellanox is the only switch vendor that is part of the top-10 contributors to open source Linux. In addition to native Linux, Spectrum switches can run Cumulus Linux or Mellanox Onyx operating systems. Network disaggregation brings transparent pricing and provides customers a choice to build their infrastructure with the best silicon and best fit for purpose software that would meet their specific needs.

25/100GbE is the new 10/40GbE

25/100GbE infrastructure provides better RoI and the market is adopting these newer speeds at record pace. Mellanox Spectrum silicon outperforms other 25GbE switches in the market in terms of every standard switch attribute.

Ethernet Storage Fabric

Ethernet-based storage technologies are replacing legacy Fibre Channel-based storage.  Distributed storage applications are bandwidth intensive. The network has to keep pace with the recent performance improvements made by the storage endpoints. Spectrum supports line rate traffic with zero packet loss and consistent low latency. RDMA over converged Ethernet is often used to achieve much higher throughput. The interconnect performance could further be fine-tuned using Explicit Congestion Notification (ECN). Spectrum implements a robust ECN mechanism that proactively minimizes congestion and avoids blocking.

The Bottom Line

Mellanox Spectrum Ethernet switches are gaining strong traction in the market fueled by the need for better and predictable performance. With network disaggregation and open Ethernet, Spectrum enables customers to build their high performance infrastructure with the best silicon in the market at an affordable cost. With all this positive momentum in the market, it is just inevitable that Mellanox Spectrum is part of the 2018 Gartner Magic Quadrant for Datacenter networks.



Network Disaggregation – A Winning Strategy that is Ready Now!

Renowned Harvard Business School Professor and author of the “Theory of Disruptive Innovation”, Clayton Christensen in an interview said:

“During the early stages of an industry, when the functionality and reliability of a product isn’t yet adequate to meet customer’s needs, a proprietary solution is almost always the right solution — because it allows you to knit all the pieces together in an optimized way.:

“But once the technology matures and becomes good enough, industry standards emerge. That leads to the standardization of interfaces, which lets companies specialize on pieces of the overall system, and the product becomes modular. At that point, the competitive advantage of the early leader dissipates, and the ability to make money migrates to whoever controls the performance-defining subsystem.”

The network industry is mature and as a natural consequence, network systems are disaggregating. Network disaggregation gives customers the freedom to choose  switch hardware and software operating system from different vendors. This is very much like buying a server and then installing your choice of operating system on it. The proliferation of disaggregated switching platforms is putting pressure on the legacy networking vendors. Traditional vendors have been forced to disaggregate for select large customers. However, it is hard for these legacy vendors to disaggregate the network for the larger market as it will adversely affect their revenue and profit margin. It simply would not fit their business model.

Mellanox pioneered the Open Ethernet approach to network disaggregation. Today, Mellanox Spectrum Ethernet switches have support for the widest range of open network operating systems. At the 2018 OCP, Mellanox demonstrated switches running Mellanox Onyx, Microsoft SoNIC, Facebook FBOSS and Cumulus Linux. The demonstrations also showcased Apstra’s intent based networking software which interfaces with network operating systems to deliver turnkey infrastructure operations, troubleshooting and automation.

What sets Mellanox Spectrum Open Ethernet switches apart?

  1. Best of breed functionality

Mellanox specializes in building high-performance, feature-rich data center switching platforms. Mellanox Spectrum Ethernet switches flexibly support 1/10/25/40/50/100GbE speeds, line rate performance, and consistent low latency. Spectrum has the best support for VXLAN and RDMA over Converged Ethernet (RoCE) available.

Even though there are many Ethernet switch vendors, all of them build systems using the identical switch chip with the same crippling performance constraints. With Network Disaggregation and Spectrum Open Ethernet switches, customers are no longer constrained by the limitations imposed by legacy chipsets.

  1. Freedom of choice without vendor lock-in

Buying traditional switches is buying into vendor lock-in, where customers pay more and get less. For example, vendor transceivers pricing is one of the biggest areas where legacy vendors “rip off” customers.  Legacy vendor lock-in tactics can cost customers as much as 50% more for optical transceivers. Some vendors use questionable tactics to force their products into the supply chain.

Legacy vendors charge software license fees for every little feature including dynamic routing protocols. Often unsuspecting customers fall for the “bait and switch ” tactics used by legacy vendors.

Mellanox open Ethernet switches have a transparent pricing model with no special software licenses and support for third-party optics. With a disaggregated model customers can enjoy true freedom of technology choice – hardware, software, and data center architecture.

  1. Broadest choice of open and fit for purpose Network Operating System (NOS)

Not all Ethernet switches get deployed in the same way, but traditional vendors attempt to address all use cases with a single NOS. Some switches may be used in a virtualized cloud network environment, others as a network packet broker or as a bare-metal Linux appliance. Mellanox Spectrum Ethernet switches support a wide variety of “fit for purpose” operating systems. For example:

  1. Mellanox Onyx is the NOS of choice for customers who are used to a classic CLI based interface with Enterprise class Layer-2, Layer-3 features. Onyx also is the preferred operating system for Storage, Machine Learning, and Network Packet Broker
  2. Cumulus Linux is the NOS of choice for customers that are DevOps centric, prefer a Linux based interface, or are interested in VXLAN based Network Virtualization.
  3. Spectrum Linux Switch is the right choice for customers who want to build their own switch systems on top of completely open Linux operating system. Customers can access switch functionality with standard Linux APIs. They can install their choice of open-source modules on top, to build their system their own way.

The Bottom line

Traditional vendors do not have a business model to support network disaggregation for the general market. Traditional closed system based supply chain favors the legacy vendors with inferior technology. Break free from the limitations of traditional networking. Disaggregate your network and build it with Mellanox Spectrum Open Ethernet Switches and choose the networking software that best suits your needs. Mellanox Spectrum switches have the highest performance and support the widest choice of open, fit-for-purpose Network Operating Systems.

Explore more


Migrate to 25/100 Gigabit Ethernet — Get the Best Datacenter Return on Investment

Gartner in a recent report says, “In the Data Center, Just Say ’No‘ to Chassis-Based Switches and 10G/40G Switches to Save 70%”.

At Mellanox, we have built the industry’s leading 25/100G Ethernet network adapters, cables, transceivers and fixed form-factor datacenter switches. With over 65% market share, Mellanox is the market leader in the 25G+ high-speed adapter market, leading by a huge margin with the ConnectX network adapters.

Customers increasingly are realizing the advantages of Mellanox 25/100G Spectrum Ethernet switches, and adopting them for their Storage, Artificial Intelligence, and Cloud infrastructures. With more interconnect bandwidth, datacenter operators can worry less about network bottlenecks and focus more on building their mission critical applications.

Let us explore the top 4 reasons on why you should follow the trend and move to a 25/100G infrastructure:

1.     2.5X better server connectivity at the same price point

According to a recently published Crehan Research long-range market forecast report (January 2018), the average selling price (ASP) of 25G adapter ports is dropping fast and will soon dip below that of a 10G port. Similarly, Crehan predicts 25G Ethernet leaf switch ports should have a less than 10% premium over 10G, while reaching the crossing point in 2019.

With 2.5X better performance and no price premium, there is no longer any reason to buy a 10G adapter or leaf switch. Move to 25G now!!

Source: January 2018 CREHAN Long-range Forecast – Server Class Adapters

2.     25/100G for an Efficient and cost-effective data center cable plant

25G SerDes technology allows more bits to pass through a single strand of copper/fiber cabling. A 100G link would need 10 strands of cable when used with legacy 10G SerDes technology. With the 25G SerDes, 100G links require only 4 strands (see figure below). This fundamental improvement allows customers to extract 2.5X better performance with pretty much the same cabling infrastructure. In other words, use fewer cables by adopting higher speeds.

Source: Article in eetimes by Mellanox

3.     100 Gigabit Ethernet leaf-spine connectivity reduces overall infrastructure costs

40 gigabit Ethernet is a waning technology and is inefficient in terms of switch faceplate utilization. Contrast this with higher density 100G spine switches that can cost-effectively pack more bandwidth per rack unit space. 100G is more cost-efficient on a per gigabit per second basis. Even the January 2018 publication of the Crehan Long-range Forecast – Data Center Switch predicts 2018 will be the crossing point for the 40G Ethernet switch port ASP, which is expected to be higher than that of a 100G switch port. The main point is that 100G-capable spine switches deliver 2.5X better performance at a lower price point.

4.     Future-proof your network to extract better storage/server resource utilization

There are several datacenter trends that necessitate 25G server connectivity; a few examples include:

  1. Server utilization has increased with the popularity of virtualization and container-based micro-services / applications, which use a serverless With this increased utilization, you need more than 10G connectivity to the server.
  2. Distributed Storage with high-performance storage endpoints.
  3. Analytics workloads that may use bandwidth-intensive technologies such as GPU direct.

The Bottom Line

25/100G Ethernet infrastructures are both efficient and better performing at a lower price point in comparison to 10/40G infrastructures. Thus, going forward, there really is no reason to choose 10/40G interconnect solutions.

Mellanox Spectrum Ethernet Switches, ConnectX adapters, Bluefield SoC and LinkX cables deliver industry-leading performance and the most comprehensive 25/100G end-to-end solutions. Therefore, make the right choice and future-proof your data center with Mellanox.


Supporting Resources:

Top-3 Ethernet Interconnect Considerations for your Machine Learning Infrastructure

Mellanox Spectrum 100GbE Ethernet Switches are ideal for Machine Learning workloads. In this article, we will explore Machine Learning training workload attributes and how they relate to Spectrum’s capabilities.

How are Machine Learning training workloads different

  1. Calculations are approximate
    The training process is statistical in nature. Tradeoffs between infrastructure speed and algorithm accuracy is often made.
  2. Computation is iterative
    The model parameter optimization is typically done to minimize model prediction error for the given training data set using an iterative method like gradient descent.
  3. Datasets are typically too big to fit in a single server
    It is typical for models to have 10s of millions of parameters and tens of billions of samples. So, it would take years to train this model by a single CPU server, or months by a single off-the-shelf GPU. The solution is to build a scale out infrastructure and distribute the load across several worker nodes.

Building high-performance ML infrastructure is a delicate balancing act

We can scale out the workload and distribute the subsets of the data to ‘m’ different worker nodes (See figure above). Each worker node can work on a subset of the data and develop a local model. However, local models work only on a subset of data and can go out of sync (and increase the error rate). In order to prevent this from happening, all worker instances need to work in lock steps with each other and periodically merge the models. The parameter server is responsible for merging the models.

Scaling out will parallelize the computation and will shorten the time for a single iteration. However, scaling out will also increase the number of independent worker nodes and hence will increase the error rate. As the error rate increases, more iterations will be needed to converge. At some point, the increased number of iterations needed will wipe out the benefits obtained from scaling out.

The optimal solution is to scale out to the point that it is still beneficial and then focus on other ways extracting performance from the infrastructure.

Mellanox Spectrum 100GbE Ethernet Switches are ideal for Machine Learning workloads

The exchange of the millions model parameters requires enormous network bandwidth. Mellanox Spectrum switches provide just that with support for line-rate 32x100GbE performance with zero packet loss.

The worker nodes can work independently for a few iterations but need to repeatedly sync-up and work in lock steps in order to converge. Consistent low latency is important in distributed systems where the individual processes work in lock steps. Jitter will make the system inefficient as the entire distributed system slows down waiting for the node that experiences the worst latency. Mellanox Spectrum switches support line rate traffic with consistent low latency and low jitter.

TCP/IP stack does not meet the performance needs for ML/AI workloads. RDMA over Converged Ethernet (RoCE) is proven to be the right choice for high performance distributed workloads such as ML. Mellanox Spectrum supports robust support for RoCE. Additionally, Mellanox Spectrum has visibility and “easy button” automation knobs to help users enable RoCE.

Bottom line

As workloads evolve, network infrastructure needs to evolve. Mellanox Spectrum Ethernet switches are the right choice to build your high-performance Machine Learning infrastructure because it supports:

  1. Reliable line rate 100GbE
  2. Consistently low latency
  3. Robust RoCE

In addition, Mellanox Spectrum has the right hooks to support for visibility, automation and orchestration tools. No wonder cloud service providers around the world are picking Mellanox Ethernet solutions to build their AI infrastructure.

Supporting Resources:


Mellanox Introduces Next Generation Ethernet Network Operating System – Mellanox Onyx™

Mellanox is excited to announce the general availability of Mellanox Onyx. Onyx succeeds MLNX-OS as the next-generation network operating system for Spectrum Open Ethernet switches.  Onyx combines rich network operating system features for the modern datacenter with the flexibility of a completely open container-based framework.

Onyx combines the best of Classic & Disaggregated Network Operating Systems

As datacenters scale, operators seek to reduce network complexity by eliminating legacy proprietary features. Standards-based scale-out Layer-3 leaf/spine disaggregated fabrics have emerged as the proven and de facto approach to building large datacenter networks. The number of protocols and features that run the datacenters keeps shrinking. The focus is shifting to network automation, orchestration and visibility solutions.

Existing disaggregated solutions are Linux based and offer great number of open source DevOps tooling options but many customers still prefer to have a classic CLI interface. However, most classic network operating systems are closed and provide only a limited access to the switch platform. So, customers have to make a difficult choice between not having a classic CLI and having limited system access. With Mellanox Onyx, you can have the best of both worlds.

The Onyx Advantage

Onyx supports all the elements required to build a robust Layer-2 and Layer-3 data center network.

  • Industry-proven Layer-3 stack
  • ZTP, Ansible, Puppet
  • 64-way ECMP
  • Robust MLAG

End-to-End Fabric Management

Onyx tightly integrates with Mellanox NEO™ orchestration software, extending management throughout the entire fabric. NEO provides a single pane of glass to manage the interconnect infrastructure – including both the Spectrum switches and the ConnectX adapters.


Additionally, Onyx supports commands to automate end-to-end QoS and buffer configurations. RDMA is the preferred transport for Storage, NVMe over Fabric and Machine Learning applications. Onyx’s enhanced support for RoCE cements Spectrum Ethernet switch’s position as the best platform for Storage and Artificial Intelligence applications.

Enhanced Visibility

Onyx also has visibility hooks to help network operators debug and troubleshoot issues. For example, large networks have tens of thousands of physical cables. Onyx comes with visibility features to proactively monitor the Bit Error Rate (BER) of optical links and notify operators. Moreover, Onyx leverages the underlying Spectrum switch platform to give insights into buffer and congestion issues.

Container-Based Framework

With Onyx, you can run containerized applications side-by-side with the main operating system. The container application can:

  • Interact with Onyx via JSON API
  • Send/Receive packets from network
  • Access silicon-level SDK

Customers using the container infrastructure can quickly and simply implement features or customize the switch functionality to address use cases such as Media & Entertainment or Storage.

The Bottom line

With Mellanox Onyx, customers enjoy the benefits of an open system while using standard protocols and features. Onyx supports industry-standard Layer-2/Layer-3 features and automation tools. In addition to these core features, Onyx provides a container-based infrastructure that can be used by the customer to further customize the switch platform, and more.

To learn more about Mellanox Onyx, click here to download the product brief.


Top 3 considerations for picking your BGP EVPN VXLAN infrastructure

The Mellanox team had the pleasure of discussing EVPN VXLAN at the Network Field Day session 17 held at Mellanox. To learn more, view the session recording. Below is our view of the top 3 considerations network operators should have before picking a solution:

1. Network Virtualization Without the Fine Print

◊ Fine Prints to watch out for with other vendors:

  • VXLAN routing is supported BUT ONLY with loopback cables
  • ONLY symmetric VXLAN routing is supported
  • ONLY asymmetric VXLAN routing is supported
  • ONLY 10GbE product support VXLAN routing

Mellanox Spectrum supports VXLAN routing in a single pass at 100/25GbE line rate. The solution supports both symmetric as well as asymmetric routing. Take a look at Ixia demonstrating VXLAN routing with IXNetwork and Mellanox Spectrum. Additionally, this solution works seamlessly with both RoCE (RDMA over Converged Ethernet) and TCP/IP traffic.

2. Performance at scale

◊ Common Performance/scale bottlenecks to watch out for with other vendors:

  • Oversubscribed buffer architecture
  • EVPN fabric can only grow up to 64 racks
  • Tunnel scale is low and non-deterministic

Mellanox Spectrum supports line rate 100GbE/25GbE switching with predictable cut-through low latency, intelligent congestion management and traffic fairness. Spectrum based EVPN fabrics can stretch up to 750 racks. Other solutions in the market typically have a limit of 64 or 128. Also, Spectrum can support up to 100K VXLAN tunnels.

3. Affordable EVPN VXLAN

◊ Common hidden costs to watch for with other vendors:

  • “Enhanced License” cost for dynamic routing protocol support
  • “Virtualization License” cost to enable VXLAN
  • “Automation license” for ZTP
  • “Visibility license” for simple network monitoring
  • Only relabeled optical transceivers are supported
  • Clunky and expensive systems built of low capacity router chipsets

Spectrum Ethernet switch ASICs integrate rich VXLAN functionality with high capacity packet switching. Thanks to this silicon integration- Mellanox is able to provide feature rich VXLAN EVPN functionality at an affordable price. In addition, the base systems with Cumulus Linux come with dynamic routing protocols and EVPN VXLAN functionality built-in. There are no additional licenses needed.

Build your affordable EVPN VXLAN fabric that works with Mellanox Spectrum and Cumulus Linux.

Supporting Resources:


Spectrum Ethernet Storage Switches Unleash Revolutionary Storage Performance

Panama: The canal that unlocked the world a century ago is doing it again. After nine years of toil, and a whopping $5.8B in investments, the wider canal opened in early 2017. Before 2017, large ships carrying natural gas from U.S. East coast had to travel all the way around the tip of South America to reach China, Korea and Japan. Expanding the canal created more efficient routes to East Asia and Australia from U.S. East coast as well as Europe. With more efficient routes to their destinations, many cargo businesses in the U.S. are expecting to grow at double the usual rate. Clearly, a better interconnected planet promotes efficiency and adds business value.

Just like a wider canal was needed to enable efficient cargo transportation and unlock latent business potential, a high bandwidth and fast Ethernet Storage Fabric is needed to unleash revolutionary storage performance. In the rest of this blog, we will explore Spectrum-based Ethernet Storage Switches that improve overall system performance, and do so much better than any typical deep buffer switch.

We are witnessing the widespread adoption of high performance, direct-attached storage devices; namely solid state drives (SSDs). These storage devices have tremendous performance but are physically stranded within individual servers. The solution often uses a high performance Ethernet Storage Fabric to share these high performance resources across multiple servers. The performance of an Ethernet Storage Fabric is heavily influenced by the switches used in the fabric. When you dig into the details of the silicon, there are two types of switches in the market that target storage applications:

  1. Switches with on-chip buffers
  2. Switches with off-chip buffers (aka ultra-deep buffered switches)

To summarize, a recent blog covering the topic of on-chip buffers, Spectrum has better burst absorption capability due to its on-chip monolithic packet buffer architecture. Legacy switches have oversubscribed and fragmented on-chip buffers that lead to sub-optimal burst absorption capability. Spectrum offers line rate non-blocking performance with zero packet loss as its buffers support line rate bandwidth.

In the second part of that blog series, we discussed why switches with off-chip buffers are not suitable for high performance applications, as they are implemented with slow but huge DRAM modules. Since they are much slower than on-chip buffers, multiple DRAM instances have to be used in each switch to provide a reasonable packet buffer bandwidth. This results in switches with gigantic but oversubscribed/blocking buffers—also known as a deep buffer switch. Because these ultra-deep buffered switches are blocking in nature, they introduce jitter and thus are not a good fit for high performance storage interconnect applications in the datacenter.

Now, we will cover a few example storage scenarios to contrast Spectrum attributes and its performance vis-à-vis legacy solutions.

  1. Why should you care about oversubscription and packet loss?

Storage applications demand high bandwidth. For example, Intel’s Ruler SSDs can pack a 1U server platform with one petabyte of storage. In the case of a distributed storage, this petabyte of data should be replicated at least two additional times across the network to provide redundancy. A non-oversubscribed, line rate, zero loss and high bandwidth network is therefore needed to move petabytes of data around. Mellanox Spectrum provides just that (See results here).

Packet loss and unpredictable interconnect performance complicate distributed storage systems. For example, if an initiator does not get a response back for a write request, it does not know whether the write request itself was dropped, the response from the target was dropped on the way back (See Figure 1), or the network is just being slow. More handshakes and communication would need to happen in order to ensure that the write request is committed into the storage target.

Figure 1: Mellanox Spectrum Zero packet loss, high performance storage fabric is perfect for an Ethernet storage switch.


  1. Why should you care about fairness?

Storage backup, data replication and shading of related traffic are typically bursty in nature and run in the background. These heavy duty flows, which run in the background, should not block other interactive sessions that run over the same fabric. It is important that the storage fabric functions fairly, without much interference between unrelated ports. Mellanox Spectrum supports non-blocking, line-rate packet switching, which helps ensure that each port gets the same line-rate performance and all connections are treated fairly. Common other switches in the market today including the,“ultra-deep” buffered switches are oversubscribed and blocking, so they cannot maintain line rate on all ports. As a result, some workloads get better performance than others, depending which ports they are using.

  1. Why should you care about jitter?

Mellanox Spectrum provides consistent low cut-through latency across all packet sizes. There are switches in the market today with 4GB packet buffer and a 4GB buffer can introduce a whopping 3.2 second delay on a congested 10GbE port. Multiple hops through deep buffered switches can introduce significant delay and variation in latency. (Variation in latency is known as jitter and is very undesirable.) This unpredictable performance with ultra-deep buffers will upset storage performance. Figure 2 shows gives an example on how the storage performance can be impacted by a delayed response packet.

Figure 2: Mellanox Spectrum provides consistent low latency across all packet sizes.



Faster storage needs faster networks. Mellanox Spectrum, with its high bandwidth, zero packet loss and consistent low latency is the ideal switch that can be used for high performance applications such as storage inside the datacenter. Spectrum includes a monolithic, high-performance on-chip buffer to ensure non-blocking performance, good burst absorption, low jitter, and fairness. Legacy switches with on-chip buffers do not have enough burst absorption capability to support Ethernet Storage Fabrics. Legacy switches with off-chip (deep) buffers are blocking in nature, introduce jitter and are not suitable for Ethernet Storage Fabrics either. In other words, a deep buffer switch is a poor choice for an Ethernet storage switch. Given the clear advantages of Spectrum’s design and performance it’s no wonder HPE Storage recently picked Mellanox Spectrum technology to power their StoreFabric M-series switches.

Supporting Resources:

Network Disaggregation – Does Your Switch have the Right Packet Buffer Architecture? (Part 2)

Gigantic buffers can hold packets for a very long time and make network troubleshooting difficult. Issues related to big buffers such as this and buffer bloat have been covered in the past. The focus of this blog is ultra-deep buffer implementation, related architectural choices and performance implications. Part 1 of this two-part blog series already covered on-chip packet buffer architectures.

Chipsets with ultra-deep buffers were originally designed to address the modular switch market. These chipsets with external buffers are not relevant to storage, big data and other high performance applications inside the datacenter. Let us get more into the details.

The challenge with implementing ultra-deep buffers

Let us take a 1.0Tbps switch on chip as an example. Ideally, the ultra-deep buffer should be able to support 1.0Tbps of packet writes and 1.0Tbps of packet reads. The buffer is built using commodity DRAMs that are meant for the server market. Following are the challenges in using these DRAMs that were originally targeting the server market:

  1. DRAMs are SO SLOW (low bandwidth).

In order to build a buffer that is functional with reasonable bandwidth, multiple (e.g. 8) DRAM banks have to be used (See Figure 1). With multiple banks, the external memory will be capable of absorbing several 100Gbps of traffic. However, you still have to live with some oversubscription as even multiple DRAM banks typically cannot support full line rate 1.0Tbps traffic.

  1. DRAMs are SO BIG (high capacity).

Minimum size of the DIMM is around 512B. If you use 8 instances, you get a 4GB buffer whether you want the capacity or not.


Figure 1: Eight instances of slow DRAMs needed to provide 512Gbps bandwidth


Now that the some vendors have this switch with 4GB buffer, they sell it as “ultra-deep” buffer switches for storage and Big Data applications. But these platforms are really not relevant for these high performance applications running inside the data center. Let me get into more details on why:

  1. Ultra-deep buffers do not translate to better burst absorption

Platforms with ultra-deep buffers have multiple bottlenecks and have an oversubscribed buffer architecture (See Figure 1). As a result, packets can be dropped indiscriminately even before classification and forwarding lookup. Said otherwise, these switches will be blocking in nature and will have port interference issues. The performance will degrade further once the packet touches the slower external memory. What is the point in having “ultra-deep” buffers if one cannot guarantee non-blocking traffic switching and better burst absorption?

Figure 2: Multiple bottle necks in external packet buffer-based switches


  1. Ultra-deep buffer switches exhibit higher latency and jitter

Today’s ultra-deep buffer platforms do not support cut-through switching. High performance applications such as storage, typically have a higher proportion of large packets. These large packets will incur extra latency due to the store and forward function. Also, switching packet data accesses between on-chip SRAM and off-chip DRAM will introduce more jitter.

  1. Ultra-deep buffer switches lack density and are less reliable

External packet memory is slow, putting a limit on the throughput that can be provided by a single switching element. Vendors will have to use multiple switching elements (e.g., 6) to get to a density of 32x100GbE density. Also, these switch chips have several dedicated interface pins that are needed to connect to the external DRAMs and use expensive silicon real estate to implement the thousands of queues that are only relevant to WAN applications. These chips are sub-optimal for high performance applications inside the datacenter. More components translates into reduced reliability and more power.

The Bottom line

The Ultra-deep buffered switches in the market today are not a good fit for storage, big data and high performance applications running inside the datacenter. Mellanox Spectrum Switches have a robust monolithic on-chip buffer with support for advanced congestion management capabilities including smart ECN. With its smart buffers, predictable low latency and support for non-blocking line rate 32x100GbE, Mellanox Spectrum is a great choice for Hadoop, machine learning, analytics, storage and other high performance applications running in the datacenter.


Supporting Resources: