All posts by Bill Webb

About Bill Webb

Bill Webb is Director of Ethernet Switching – Americas at NVIDIA Networking. In this role, he evangelizes the benefits of NVIDIA’s Mellanox Ethernet switch portfolio in scale-out data center, storage, cloud computing, and AI/accelerated deep learning environments. Bill has spent over 20 years in the networking industry in a variety of sales, engineering, and management roles. Prior to NVIDIA, Bill worked at Concurrent (now Vecima), where he introduced a Ceph-based scale-out storage product for media streaming and at Ciena, where he led a team developing first generation Software Defined Networking applications. Bill started his career at Nortel Networks and then later worked at a several start-ups building fiber-to-the-premise technology.

Enabling HPC and AI Cloud with Ethernet Switching

These are exciting times for Mellanox, especially with Spectrum Ethernet switching.  We are experiencing extreme momentum across many different verticals and use-cases, whether it’s a cloud company or a bank or any enterprise deploying open, scale-out infrastructure.

Ethernet Storage Fabrics (ESF) provide the fastest and most efficient networking solution for storage.  ESF leverages the speed, flexibility, and cost efficiencies of Ethernet with the best switching hardware and software packaged in ideal form factors to provide performance, scalability, intelligence, high availability, and simplified management for storage.

An extreme use-case of ESF is High Performance Computing (HPC) HPC and Artificial Intelligence (AI)/Deep Learning (DL).  We recently have gained significant wins with customers and partners with HPC and AI.  This is a testament to the fact that our customers realize the value our deep experience and history in HPC brings to Ethernet.

In this blog, I’ll highlight our recent deployment with DownUnder Geosolutions.  Also, I’ll discuss our recently announced AI Cloud reference architecture with Nutanix and NVIDIA.  In future blogs, I’ll highlight some of our other HPC/AI customers and partners who are enjoying the benefits of Mellanox Spectrum Ethernet switching.

DownUnder GeoSolutions McCloud Service

DownUnder GeoSolutions (DUG) recently announced their selection of Mellanox end-to-end Ethernet for their massive exascale-focused HPC facility focused on seismic processing.  This facility will scale to over 40,000 compute nodes, leveraging Mellanox high-throughput, low-latency 100G Ethernet switches and adapters.

You can find details about the DUG McCloud deployment here –

DUG supercharges massive HPC cloud service with Mellanox multi-host adapter

Mellanox Powers Massive HPC Cloud Service for DownUnder Geosolutions

One of the most unique aspects of the DUG network is the Mellanox Multi-Host ConnectX adapters and the Spectrum switches.

Mellanox’s multi-host solution

Mellanox’s multi-host solution

The Multi-Host solution provides the following advantages –

Efficiency was one of the main attractions DUG had to the solution – 50% less switches and 75% less cables.  This provides DUG cost savings on the network.  But, more importantly, it allows DUG to pack the most network and servers possible into their data center footprint.

In addition, network performance is critical for the McCloud service at DUG.  The Spectrum SN2700 provides performance that is not available in any other Ethernet switch.  DUG is able to leverage the performance advantages by creating an HPC workload local to the 256 node pods connected to a single Spectrum switch.  The advantages include –

  • Fair Traffic Distribution – all flows get fair bandwidth across the network
  • Superior Microburst Absorption – especially critical for incast traffic to the storage nodes
  • Lowest Latency – consistent 300ns latency, no matter the packet size
  • Zero Packet Loss – full line-rate forwarding at all packet sizes

The bottom line – DUG chose Mellanox End-to-End Ethernet for their McCloud HPC deployment because we are HPC experts.  We provide huge efficiencies with the data center deployment as well an ensure the best network performance for the McCloud cluster – allowing DUG to provide a superior service while maximizing efficiencies and minimizing risk.

Artificial Intelligence Enterprise Cloud with Nutanix and NVIDIA

Moving onto another exciting development, Mellanox recently partnered with Nutanix and NVIDIA to provide an Enterprise Cloud for AI Reference Architecture.  Due to the mass amount of distributed processing needed for AI/DL, it’s clear that Mellanox’s HPC-ready Spectrum Ethernet switches bring value in this environment.

Any enterprise that wants to stay relevant in the 21st century is investing into AI/DL capabilities.  The AI Cloud solution from Nutanix, NVIDIA, and Mellanox makes it easy for enterprises to quickly deploy and operate shared infrastructure for AI/DL.  Advantages of our joint solution include –

  • Simplified Operations and Troubleshooting – making it easy to deploy and operate
  • Enterprise-grade Uptime, Backup/Restore, and Disaster Recovery
  • Distributed Architecture with Linear Scaling – meet the needs of today and tomorrow
  • Built-in Security – protect your data while allowing many users of the infrastructure
  • Less Rack Space – consolidated platform for business-critical applications and AI
  • Simplified Networking – automated provisioning and full network visibility

Mellanox provides a Simplified Networking environment for the AI Cloud.  Even prior to the joint AI Cloud solution, Mellanox has won Elevate Partner Awards from Nutanix for our Nutanix Ready and Calm Blueprint solutions due to our integration with AHV, purpose-built HCI switches, and ability to support any workload.

Simplified Networking is critical for the AI Cloud.  AI/DL infrastructure is expensive.  It must be simple for enterprises to provision services on the infrastructure, and requires a network solution where all network provisioning is done automatically.  Furthermore, troubleshooting and identifying sub-optimal network operation is required in order to meet service-level agreements (SLAs) and maximize the utilization of the HCI and GPU investment.  Mellanox provides a complete automation and visibility solution with its NEO plug-in for Nutanix Prism, as well as very unique telemetry features in its Spectrum switching hardware.

Mellanox Ethernet switching on Nutanix

Mellanox Ethernet switching on Nutanix

Beyond simplicity, the Mellanox Spectrum switches provide additional unique advantages to the AI Cloud, including–

  • Best-in-class Performance – required for AI/DL workloads (see advantages in previous section)
  • Accelerated Flash Storage – leveraging end-to-end RoCE – Remote Direct Memory Access (RDMA) over Converged Ethernet
  • Easy Scaling – unique switch form factors to support any cluster size today or tomorrow

AI/DL environments require accelerated hardware to maximize their investments – whether it’s the GPUs in the NVIDIA DGXs or the end-to-end Ethernet solution from Mellanox.  The accelerated hardware minimizes the time needed by Data Scientists and Deep Learning Engineers to train their models, significantly increasing the productivity of the infrastructure and the Data Science and Deep Learning teams.  Furthermore, an enterprise-ready solution is required for simplified operation and always-on availability.  The Nutanix AI Cloud solution with Mellanox and NVIDIA is the meets these needs by leveraging technology from market and technology leaders.

Conclusion

Mellanox’s Ethernet solutions are a perfect fit for HPC and AI Cloud solutions.  We provide the performance, automation, and efficiencies required – making it easy to deploy and operation, while ensuring you get the most out of your high-end infrastructure.

We expect the exciting times to continue for a long time.  Please reach out if you want to learn more about our Spectrum Ethernet switching solutions.

Also, this article touched on only two of our examples of HPC and AI/DL Cloud.  Stay tuned for future blogs highlighting more of our customers and partners!

Further reading –

 

 

Mellanox Switches for Universities

Universities aren’t just moving to the cloud (though some student might be living in the clouds).  Universities are actually transforming their campuses into clouds.  The latest trend is the Education Cloud, following the path of what the hyperscalers have done – in fact, universities are becoming microcosms of what is happening in the world of Hybrid Cloud.  They have found that shared, scale-out infrastructure makes great ‘business’ sense.  More importantly, universities realize it is critical to leverage the latest infrastructure to deliver cutting-edge education and research in the 21st century.

Let’s look at what’s happening on campus –

  • Mobility – students are on the move, connecting to education resources in places on and off campus, using many different device types.
  • Virtual Desktop Infrastructure (VDI) – learning and research compute resources are consistently managed and delivered leveraging central cloud resources.
  • Video Streaming – lectures can be delivered on-demand at across multiple campuses. In addition, research is leveraging video capture more-and-more as considerable advancements in AI and Deep Learning are allowing researchers to leverage computer vision.
  • Science as a Service – high performance cloud resources can be shared across many research groups within the university.
  • Administrative Enterprise Software – ERP, scheduling, etc.

We’ve been working with universities for a long time.  For almost two decades, we’ve built the fastest networks for High Performance Computing (HPC) clusters.  Over the last several years, we’ve extended our relationship with Universities, providing an end-to-end Ethernet solution to support diverse and high-performance workloads – within a campus data center, across a campus, and even between campuses.

 

University Data Center

This is the heart of the University Cloud.  Universities are providing Infrastructure as a Service (IaaS) resources to the various teaching and research departments on campus.  They’re levering cloud platforms such as OpenStack, Nutanix and VMWare to provide the services.

Mellanox has the best networking solution today for University Cloud data centers –

  • Scale-out architecture from dozens to 10,000s of nodes.
  • Strong integration with cloud platforms such as OpenStack and Nutanix.
  • Market Leading Performance at 1/10/25/40/50/100 GbE.
  • High Performance Storage Acceleration with NVMe over Fabrics and Ceph RDMA.
  • Multi-tenancy with highest scale and most flexible EVPN/VxLAN on the market.
  • AI / Deep Learning Acceleration with GPUDirect and RDMA.

 

Science as a Service is growing in popularity in a University Cloud offering.  High performance infrastructure – especially Graphics Processing Units (GPUs) – are expensive, but are the cornerstone to leveraging Big Data and AI techniques like Deep Learning within university research.  Universities have found that economies of scale are realized – and processing power is significantly increased – by building shared, on-demand infrastructure.  The Mellanox network provides the performance, scale, acceleration, and security required to build world class research infrastructure.

In addition, universities have found significant advantages in deploying Hyperconverged Infrastructure (HCI) for supporting applications like education mobility, VDI, and cloud-based storage.  Mellanox has partnerships with several HCI providers, including Nutanix.  In fact, we won the Nutanix Elevate Partner of the Year award at their last .NEXT Conference due to our unique integration with Nutanix Prism.  We are the only Ethernet switch solution that provides visibility and complete network automation in a Nutanix environment – providing the Invisible Network to the Invisible IT Infrastructure.

 

Campus Edge & Data Center Interconnect

The Edge of the network is where the users get online.  In addition, Edge Computing is a growing trend that will certainly be fund on University campuses – whether it’s IoT / Smart Campus initiatives or it’s research departments with their own compute resources that need to interact with the University Cloud.

Mellanox has a great solution for the Campus Edge.  We provide traditional data center switches as well as switches with unique form factor.  We provide a full ToR switches that support up to 100GbE in a half-RU.  This can be used to aggregate Edge switches (such as wireless or PoE), and also be used to provide connectivity to Edge Computing clusters – all with a small footprint and low power (60w).  These switches are ideal for communications closets or edge compute clusters located within departmental buildings.

Furthermore, network multi-tenancy can extend from the central data center to across the campus.  This means that researchers can have their own logical network on the same physical network that shared by students or other departments across the campus or to remote campuses.  This is powered by Mellanox’s highly scalable and flexible EVPN feature.

 

Mellanox provides a wide range of transceivers and cabling, providing flexibility in where the network equipment can be deployed on campus.  The cabling can be used to economically build the scale-out data center, then leverage our optical transceivers to connect the Edge sites across campus – or even connect to remote campuses nearby.  Our transceivers and cabling have target bit-error rates well below industry standard.

Performance and Acceleration

Mellanox’s heritage is in high performance networking.  The passion of Mellanox is to provide best-in-class switching performance, and to also leverage the network to accelerate applications like Ethernet Switch Fabrics (ESF) and AI/Deep Learning with GPUs by the use of Remote Direct Memory Access (RDMA) technology.  Mellanox is the pioneer at provided end-to-end RDMA over Converged Ethernet (RoCE) systems.

The Tolly Report provides a comparison of the Mellanox Spectrum switch vs. Broadcom Tomahawk-based switches (typical is most other data center switches).  In every category that matters for Ethernet switching, the Spectrum switch perfumed significantly better.  This means that you can deploy a performant network across the campus and be rest-assured that the Cloud services will perform as needed by the wide array of users.

 

In an ESF environment, RoCE provides acceleration for storage such as NVMe over Fabrics or scale-out storage such as Ceph.  As solid-state storage continues its significant growth, the network is becoming the storage bottleneck.

 

In a Deep Learning environment, RoCE provides significant acceleration to GPU environments by use of GPUDirect technology.  Deep Learning frameworks such as TensorFlow can leverage GPUDirect and RoCE to significantly decrease the time it takes for training.  GPUs are expensive and popular, so it’s critical leverage every possible teraflop available in the cluster.

Conclusion

Universities are making heavy investments in cutting-edge infrastructure to build their own University Cloud.  The Cloud allows students, researchers, and administrators to leverage the best technology available, while providing favorable economics to the University.

Mellanox has a cutting-edge end-to-end Ethernet solution that meets the demands of the University Cloud.  We can provide the performance, acceleration, and flexibility required in a scale-out cloud data center.  In addition, Mellanox has the form factors, multi-tenant support, and optics flexibility to provide edge computing and aggregation across the campus – and even provide connectivity between campuses.  Indeed, Mellanox has the perfect networking solution as Universities transforms themselves into clouds.

The Case for Whale Sharks and Micro Data Centers

The last several years have experienced a huge migration of IT infrastructure to the public cloud.  And it all makes sense, let the cloud provider invent in, manage, and scale the infrastructure which leads to lower costs and improved organization agility for the end-user.

Let’s compare the public cloud to the whale shark – one of the largest animals in the world. While its sheer size is imposing, it is actually docile and works in tandem with smaller fish. The smaller fish, or pilot fish, help keep parasites away from the whale shark and in return, the whale shark acts as a body guard for the smaller fish.

Now, while large public cloud providers continue to grow (the whale sharks are getting bigger!), there is also a huge growth in the number of Edge or Micro Data Centers. The fish are multiplying, because they can be more agile and faster, and go places where the public cloud cannot.

Why?  Autonomous vehicles are becoming closer to reality. Smart Cities are emerging with the use of Internet of Things technology. Augmented and virtual reality (AR/VR) have seen huge advances. And, of course, enterprises are realizing that they must follow a hybrid strategy. They need to use aggregation data centers between their users and centralized cloud infrastructure, in addition to remote office, branch office (ROBO) hyperconverged (HCI) infrastructure.

In a recent blog, Yuval Bachar, Principal Engineer of Data Center Architecture of LinkedIn, noted that, “Looking at the number of locations and servers per locations globally, the number of nodes [in edge data centers] will accumulate to be larger than the number of compute nodes in the traditional cloud data centers within the next 3-5 years.”

Hybrid Cloud, Autonomous Vehicles, AR/VR, IoT – just some of the drivers toward Micro Data Centers.

This is resulting in the need for Micro Data Centers. In general, there are data centers with power consumption of less than 1MW; and in many instances, significantly less. The Micro Data Center can take many forms, including a data center rack on wheels, a shipping container, a modular, expandable structure, a Navy ship, and even existing cell sites and telecom huts.

A wide variety of Micro Data Center form factors.

 

No matter the form factor of the Micro Data Center, it’s still providing the following values:

  • Immediate Response and Low Latency; provide compute services as close to the edge as possible.
  • Data Processing; locally process huge amounts of data for immediate action, often utilizing machine learning and analytics.
  • Data Aggregation; aggregate and summarize data to centralized cloud, preventing the needs of large and expensive data movement across the WAN.
  • Interconnectedness; provide connectivity to the edge devices as well as centralized cloud resources and other Micro Data Centers.
  • Light Footprint; small physical form factors, as well as very low power and cooling needs, allowing for flexibility in where the Micro Data Center is deployed.

No wonder the whale shark hangs out with the pilot fish!

Mellanox builds Ethernet switches which are perfect for Micro Data Centers to large, scale-out data centers. This allows you to deploy the switch you need, where you need it, and leverage a single family of switches across all your data centers; small, medium, and large.

A wide range of Ethernet switch form factors – prefect for any data center, micro to very large.

 

Take, for instance, our newest switch addition, the SN2010. It provides 18×10/25G and 4x100G interfaces all in a half rack unit. This means that you can provide a hyperconverged cluster with a fully redundant network configuration in one rack unit. The dual switch configuration will only require 160 watts of power between both switches. This is perfect for a Micro Data Center requiring a light footprint for both physical space and power.

All of Mellanox’s  Spectrum switches utilize the same, best-in-class, Spectrum ASIC, built by Mellanox. They provide the predictable performance required for the real time processing done in Micro Data Centers. In fact, Mellanox Spectrum switches are the only low latency switch on the market for 25GbE and above networking. Moreover, they provide support for GPUDirect and RDMA over Converged Ethernet (RoCE) to significantly accelerate analytics and Machine Learning.

The Micro Data Center needs the ability to connect with other data centers. This will typically be a large, centralized data center (including public cloud), as well as other, peer Micro Data Centers. Mellanox Spectrum switches provide best-of-class EVPN and VxLAN support to accomplish just that.

Best-in-class, standards-based Data Center Interconnect solution.

Mellanox Spectrum switches provide significant scale advantages for VxLAN and DCI, which is critical as the number of Micro Data Centers increases. In addition, EVPN is a controller-less technology, which means the entire control plane is embedded in the switches – and Mellanox doesn’t charge extra for additional features. Therefore, you can leverage Mellanox switches for incredibly scalable DCI solutions, and do so incredibly cost-effectively.

Ease of deployment is critical for Micro Data Centers. When dozens or hundreds of Micro Data Centers are deployed or repositioned, automation and zero-touch provisioning is required to support cost-effective deployment. Mellanox supports many options through its NEO network orchestration system and a suite of playbooks for both Cumulus Linux and MLNX-OS network operating systems.

Micro Data Centers are one of the hottest topics now in the world of data center deployment and networking. While the shift to centralized cloud has been significant, there is nothing like having data processed and acted upon as close to the user as possible.

When building a Micro Data Center, the Mellanox Spectrum switches provide the perfect solution:

  • Unique form-factors – half-rack, low power (80 watts) all the way up to full density switches,
  • Predictable Performance – zero packet loss and low latency, accelerating real-time responses to users, at speeds of 10/25/50/100 GbE,
  • Data Center Interconnect – standards-based EVPN/VxLAN support with incredible scale and no additional cost and,
  • Ease of Deployment – NEO network orchestration and NetDevOps playbooks with Cumulus Linux and MLNX-OS.

What a catch! (Haha)

Supporting Resources:

 

Are You Flying Blind with Ceph?

Ceph storage is great.  It’s flexible – you can use it for file, block, and object storage – even at the same time.  It’s huge – in cloud environments, containers, microservices – the modern architectures.  It’s open – you can run it on any hardware you want.  It scales – you can keep adding storage nodes without the need for painful data migrations.  And it can be free – you can run the open source community version, or purchase support.

But, this sort of flexibility comes at a cost.  Out-of-the-box, Ceph is ready to run ‘ok’ for most use-cases.  Think family mini-van.  It can hold a fair amount, but it’s not the biggest on the road. So, maybe you really want something like an 18 wheeler.  Also, it can go a little bit above the speed limit, but that’s it and it will certainly take a while to get you where you want to go – maybe you really want something like a Porsche.

How can you make Ceph what you want – and have the visibility you need?  This blog will discuss how Mellanox Spectrum switches allow you optimize, operate, and accelerate Ceph.

Optimize

By its nature, Ceph has a many-to-one traffic pattern, also known as ‘incast’ traffic.  When data is written to Ceph, it is distributed evenly across all data nodes.  When a client reads data, it will directly read from the Ceph data nodes, resulting in a many-to-one communication pattern – incast.  Incast can cause microbursts, particularly on the client’s network port.  Replication, whether 3x or erasing coding, results in many-to-one on the cluster network.  Spectrum switches have significant advantages in support incast traffic.  Spectrum switches benefit from a unique, shared buffer architecture.  This means all buffering is available to all ports at any given time.  As a result, Spectrum switches offer 10x better microburst absorption compared to other switches.

Furthermore, anyone who was spent time with a Ceph deployment knows it needs to be tuned.  There are a myriads of settings that can be changed – Ceph settings, kernel settings, network settings…the list goes on.

When you change the settings, how do you know that it’s optimal?  Sure, run a storage benchmark, see that you get the throughput you want, and call it a day.  But, there could be trouble lurking.  You need very detailed insight into what’s happening on the network.

Fortunately for you, Mellanox Spectrum switches have the most advanced telemetry on the market today.  Every 128 nanoseconds, the Spectrum hardware can take samples of port queue depth and bandwidth.  These samples are then pulled into a histogram to show queue utilization over a short period of time.

The histogram data can then be used to detect very critical behavior in a Ceph cluster.  As you change tuning values, you can know if Ceph is –

  • Causing congestion in the network
  • Causing latency to increase over time (queue lengths gradually increasing)
  • Causing microbursts

The level of telemetry detail on Spectrum switches is far beyond anything seen on other switches.   For instance, in this picture, a competitor’s switch would show the queue length at time 19:19:49 as congestion.  By using Spectrum’s Advanced Telemetry, it’s clear that this is really a momentary microburst.  What you’d do as a Ceph optimizer is much different.  For congestion, you’d examine your client load and/or add more network capacity.  For a microburst, you’d likely look closely at the TCP, messenger, and thread tuning values.

So, next time you’re tuning Ceph, make sure you’re leveraging all the data possible.  Storage benchmarking only tells you part of the story – it’s the tip of the iceberg.  You need to know that the network is performing cleanly – as issues caused by improper tuning might rear its ugly head when things get crazy during operations….

 

Operate

Operating a Ceph cluster is a walk in the part, right?  After all, it’s self-healing, it intelligently distributes data across the cluster using the CRUSH algorithm, and let’s face it – Ceph is pure magic.

The reality is, people who have operated a Ceph cluster likely have some scars to prove it.  The author of this blog has had times where he’s felt like the Jack Torrance character from The Shining after hours of hours of pouring through Ceph logs, checking kernel counters, looking at network counters and details – trying to find out what caused the Ceph ‘slow request’. (If you don’t know what a Ceph ‘slow request’ is – be thankful!)

Things can get worse when there is a failure.  Ceph is self-healing.  Yes, it will rebuild data and rebalance the cluster.  But, this happens at a cost.  The same storage node CPUs that are processing client requests are also responsible for performing the recovery and backfill.  Even though a Ceph cluster might not have lost data during a failure, its performance can be severely degraded.

In this situation, the Spectrum switches provide benefits beyond anything else out there.  For one, Spectrum switches are the only switches that can provide Predictable Performance – no packet loss at any packet size, consistent 300 nanosecond latency, and shared buffers that provide 10x better microburst absorption.  This means that when Ceph is busy self-healing, the Spectrum switches are providing best-in-class performance, allowing the recovery to go smoothly.  If you want to learn more about Spectrum’s Predicable Performance, check out this Tolly Report – where Spectrum performance is proven by an unbiased, 3rd party to be far superior vs. alternatives vendors.

Furthermore, you need to know what’s going on – any information….any useful information.  The Advanced Telemetry data can be streamed to monitoring tools, in real-time.  And just like Ceph, these can be open source, free tools, such as Grafana.  You can then use the monitoring information as an additional feedback point when adjusting Ceph backfill tunables.

In addition, the monitoring information can be used to identify other problems during operations.  Odd network traffic can indicate other issues, like run-away clients, pre-failed hard drives, and much more.  Let’s face it, being scale-out storage, Ceph is as much dependent on a high performing network as it is on the storage.  And to operate Ceph successfully, you need all the information you can get from the network.

To close out discussing Ceph operations, let’s have our dessert.  We’ve already discussed the highly-granular telemetry information that is available from Spectrum switches.  Well, all of that information is available to anyone with direct access from the Spectrum SDK.

Spectrum switches have the ability to run Docker containers directly on the switches.  The containers can directly access the SDK to read information about the switch, and also interact with the Network OS to configure stuff.

Furthermore, a containerized agent can be responsible for storage service policy changes.  This allows the storage administrator the ability to change the network configuration, without yielding full control from networking group.  This helps break down the organization walls between groups – the storage guys now can get what they need from the network, while the network guys still maintain their control of the network.

 

Accelerate

Even if you haven’t seen the movie Zootopia, you might have seen the highly popular trailer that featured a slow-speaking sloth and a fast moving rabbit, the main character of the movie.

The comparison to storage is immense.  Traditional, magnetic storage – where a platter spins and a head moves to read data – is slow as a sloth.  Milliseconds…to…get…my…data…!  Compare that to solid state storage such as flash and optane – with access times measured in microsecond.  Fast as a rabbit!

What does this mean for the network?  A whole lot.  Mellanox has been the driving force behind a technology called Remote Data Memory Access, or RDMA.  In an Ethernet environment, it is called RDMA over Converged Ethernet, or RoCE.  RoCE allows applications to bypass the local operating system when transferring data across nodes – significantly increasing application performance and freeing up the CPU to do more useful things.  That’s a win-win – faster applications and lower cost, since your CPUs can run more applications.

We’ve done extensive benchmarking running Ceph with RoCE, and here are some of the results –

Mellanox Spectrum switches used in this benchmarking are the best RoCE switches on the market today.  The Predictable Performance, consistent 300 nanosecond latency, and 10x better microburst absorption, as discussed in the previous blog, is one part of it.

Beyond that, the Spectrum switches include superior congestion avoidance and handling – including Fast ECN (cutting 8ms+ off of time required to notify a client of congestion), and intelligent handling of flow control to avoid congestion spreading and victim flows.

The Advanced Telemetry features of Spectrum bring it all together.  The 128 nanosecond resolution of the buffer and bandwidth monitoring is crucial when storage can operation with sub-10 microsecond latency.  The smallest changes in network performance can have major impact on storage, making real-time monitoring even more crucial.

 

Conclusion

When deploying Ceph, it is critical that you have real-time visibility into the network.  It requires optimization for your use-case and on-going insights as you operate the cluster.  These needs significantly increase as you Accelerate Ceph when adopting solid state storage.

So, when building your Ceph network, the network must provide –

  • Predictable Performance – zero packet loss, consist low latency, superior microburst absorption
  • Advanced Telemetry – sub-microsecond sampling, real-time monitoring, on-switch agent deployment with direct SDK access
  • Network Acceleration – RDMA/RoCE, the best RoCE switch on the market