All posts by John F. Kim

About John F. Kim

John Kim is Director of Storage Marketing at Mellanox Technologies, where he helps storage customers and vendors benefit from high performance interconnects and RDMA (Remote Direct Memory Access). After starting his high tech career in an IT helpdesk, John worked in enterprise software and networked storage, with many years of solution marketing, product management, and alliances at enterprise software companies, followed by 12 years working at NetApp and EMC. Follow him on Twitter: @Tier1Storage

NVIDIA announces the EGX platform and ecosystem for smarter edge computing

Smarter Edge Computing Needs a Smarter Network

Demand for edge computing is growing rapidly because people increasingly need to analyze and use data where it’s created instead of trying to send it back to a data center. New applications cannot wait for the data to travel all the way to a centralized server, wait for it to be analyzed, then wait for the results to make the return trip. They need the data analyzed RIGHT NOW, RIGHT HERE!  To meet this need, NVIDIA just announced an expansion of their EGX platform and ecosystem, which includes server vendor certifications, hybrid cloud partners, and new GPU Operator software that enables a cloud-native deployment for GPU servers. As computing power moves to the edge, we find that smarter edge computing needs smarter networking, and so the EGX ecosystem includes networking solutions from Mellanox.

IoT Drives the Need for Edge Computing – Enabled by 5G and AI

The growth of the Internet of Things (IoT), 5G wireless, and AI are all driving the move of compute to the edge. IoT means more – and smarter – devices are generating and consuming more data, but in the field, far from traditional data centers.  Autonomous vehicles, digital video cameras, kiosks, medical equipment, building sensors, smart cash registers, location tags, and of course phones will soon generate data from billions of end points. This data needs to be collected, filtered, and analyzed, then often the distilled results transferred to another data center or endpoint somewhere else. Sending all the data back to the data center without any edge processing not only adds too much latency, it’s often too much data to transmit over WAN connections. Sometimes the data center doesn’t even have enough room to store all the unfiltered, uncompressed data from the edge.

5G brings higher bandwidth and lower latency to the edge, enabling faster data acquisition and new applications for IoT devices.  Data that previously wasn’t collected, or couldn’t be shared, is now available over the air. The faster wireless connectivity enables new applications that use and respond to data at the edge, in real-time, instead of waiting for it to be stored centrally then analyzed later (if it’s analyzed at all).

AI means more useful information can be derived from all the new data, driving quick decisions. The flood of IoT data is too voluminous to be analyzed by humans and it requires AI technology to separate the wheat from the chaff (that is, the signal from the noise). The decision and insights from AI then feed applications both at the edge and back in the central data center.

Internet of Things (IoT) and 5G wireless drive increased deployment of AI computing at the edge

Figure 1:Internet of Things (IoT) and 5G wireless drive increased deployment of AI computing at the edge

NVIDIA EGX Delivers AI at the Edge

Many edge AI workloads—such as image recognition, video processing, and robotics—require massive parallel processing power, an area where NVIDIA GPUs are unmatched.  To meet the need for more advanced      AI processing at the edge, NVIDIA introduced their EGX platform in May. The EGX  platform supports a hyper-scalable range of GPU servers—from a single NVIDIA Jetson Nano system up to a full rack of NVIDIA T4 or V100 Tensor Core servers. The Jetson Nano delivers up to half a trillion operations per second (1/2 TOPS) while a full rack of T4 servers can handle ten thousand trillion operations per second (10,000 TOPS).

NVIDIA EGX also includes container-based tools, drivers and NVIDIA CUDA-X libraries to support AI applications at the edge. EGX is supported by major server vendors and includes integration with Red Hat OpenShift to provide enterprise-class container orchestration, based on Kubernetes. This is all critical because so many of the edge computing locations—retail stores, hospitals, self-driving cars, homes, factories, cell phones, etc.— will be supported by enterprises, local government, and telcos, not by the hyperscalers.

Today NVIDIA announced new EGX features and customer implementations, along with strong support for hybrid cloud solutions. The NGC-Ready server certification program has been expanded to include tests for edge security and remote management, and the new NVIDIA GPU Operator simplifies management and operation of AI across widely distributed edge devices.

Figure 2: NVIDIA EGX platform includes GPU, CUDA-X interfaces, container management, and certified hardware partners.

Figure 2: NVIDIA EGX platform includes GPU, CUDA-X interfaces, container management, and certified hardware partners.

Smarter Edge Needs Smarter Networking

But there is another class of technology and partners needed to make EGX—and AI at the edge—as smart and efficient as it can be: the networking. As the amount of GPU processing power at the edge and the number of containers increases, the amount of network traffic can also increase exponentially. Before AI, the analyzable edge data traffic (not counting streamed graphics, videos and music going out to phones) probably flowed 95% inbound, for example from cameras to digital video recorders, from cars to servers, or from retail stores to a central data center. Any analysis or insight would often be human driven as people can only concentrate and observe a single stream of video or be stored for a later date, removing the ability to make instant decisions.  Now with AI solutions like EGX deployed at the edge, they need to talk with IoT devices, back to servers in the data center, and with each other. AI applications trade data and results with standard CPUs, data from the edge is synthesized with data from the corporate data center (or the public cloud) and the results get pushed back to the kiosks, cars, appliances, MRI scanners, and phones.

The result is a massive amount of N-way data traffic between containers, IoT devices, GPU servers, the cloud and traditional centralized servers. Software-defined networking (SDN) and network virtualization play a larger role. And this expanded networking brings new security concerns as the potential attack surface for hackers and malware is much larger than before and cannot be contained inside a firewall.

As networking becomes more complex, the network must become smarter in many ways. Some examples of this are:

  1. Packets must be routed efficiently between containers, VMs, and bare metal servers.
  2. Network function virtualization and SDN demand accelerated packet switching, which could be in user space or kernel space.
  3. The use of RDMA requires hardware offloads on the NICs and intelligent traffic management on the switches.
  4. Security requires that data be encrypted at rest or in flight, or both (and whatever is encrypted must also be decrypted at some point).
  5. The growth in IoT data combined with the switch from spinning disk to flash call for compression and deduplication of data to control storage costs.

These increased network complexity and security concerns impose a growing burden on the edge servers as well as on the corporate and cloud servers that interface with them. With more AI power and faster network speeds, handling the network virtualization, SDN rules, and security filtering sucks up an expensive share of CPU cycles, unless you have the right kind of smart network. Thus, as the network connections get faster, that network’s smarts must be accelerated in hardware instead of running in software.

SmartNICs Save Edge Compute Cycles

Smarter edge compute requires smarter networking, but if this networking is handled by the CPUs or GPUs then valuable cycles are consumed by moving the data instead of analyzing and transforming it. Someone must encode and decode overlay network headers, determine which packet goes to which container, and ensure that SDN rules are followed. Software-defined firewalls and routers impose additional CPU burdens as packets must be filtered based on source, destination, headers, or even on the internal content of the packets. Then the packets are forwarded, mirrored, rerouted, or even dropped, depending on the network rules.

Fortunately, there is a class of affordable SmartNICs, such as the Mellanox ConnectX family, which offload all this work from the CPU. These adapters have hardware-accelerated functions to handle overlay networks, Remote Direct Memory Access, container networking, virtual switching, storage networking, and video streaming. They also accelerate the adapter side of network congestion management and QoS, and the newest adapters, such as the Mellanox ConnectX-6 Dx, can perform in-line encryption and decryption in hardware at high speeds, supporting IPsec and TLS. With these important but repetitive network tasks safely accelerated by the NIC, the CPUs and GPUs at the edge connect quickly and efficiently with each other and the IoT, all the while focusing their core cycles on what they do best—running applications and parallelized processing of complex data.

BlueField IPU Adds Extra Protection Against Malware and Overwork

An even more advanced network option for edge compute efficiency is an IPU or I/O Processing Unit, such as the Mellanox BlueField. An IPU combines all the high-speed networking and offloads of a SmartNIC with programmable cores that can handle additional functions around networking, storage, or security. For example, an IPU can offload both SDN data plane and control plane functions; it can virtualize flash storage for CPUs or GPUs; and it can act implement security in a separate domain to provide very high levels of protection against malware.

On the security side, IPUs such as BlueField provide security domain isolation. Without it, any security software is running in the same domain as the OS, container management, and application. If an attacker compromises any of those, the security software is at risk of being bypassed, removed, or corrupted. But with BlueField, the security software continues running on the IPU where it can continue to detect, isolate, and report malware or breaches on the server. By running in a separate domain—protected by a hardware root of trust—the security features can sound the alarm to intrusion detection and prevention mechanisms and also prevent malware on the infected server from spreading.

The newest BlueField-2 IPU also adds Regular Expression (RegEx) matching that can quickly detect patterns in network traffic (or server memory), so it can be used for threat identification. It also adds hardware offloads for data efficiency via deduplication (via a SHA-2 hash) and compression/decompression.

Smarter Edge Requires Smarter Networking 

With the increasing use of AI solutions, like the NVIDIA EGX platform, at the edge, the edge becomes infinitely smarter, but networking and security also get more complex and threaten to slow down servers, just when the growth of the IoT and 5G wireless requires more compute power.  This can be solved with the deployment of SmartNICs and IPUs, such as the Mellanox ConnectX and BlueField product families. These network solutions offload important network and security tasks, such as SDN, network virtualization, and software-defined firewall functions. This allows AI at the edge to run more efficiently and securely, and that’s why Mellanox and NVIDIA are excited to work together around edge-based solutions.

 

The Cost of Keeping Up: Overcoming Entertainment’s Technology Catch-22

The late 1990’s represented a tipping point for the entertainment industry. The emergence of high-definition television and the standardization on DVDs set into motion exponential growth in video file sizes — a cycle that continues to this day. Now, with the emergence of Ultra High Definition (UHD) and native 4K video, the amount of data being created, edited, rendered, and stored in the entertainment industry is exploding. It’s tripling or even quadrupling in volume according to Filmlance, an award-winning indie film production company based in Sweden with over 100 titles under its belt. A television show produced by the studio, which might have run at around 10 terabytes of raw source material to use in post-production a few years ago, now weighs in at around 40 terabytes of video — not including additional visual effects (VFX) and intermediate and delivery files, which have also grown at the same rate to stay consistent with resolution quality.

With the shoot and production time for a normal movie pegged at 12 to 18 months, this has created the entertainment industry’s own proverbial technology catch-22. Do you select and deploy infrastructure that will enable you to deliver entertaining content only today, or anticipate and prepare for the future?  Faced with this dilemma, Filmlance went to Mellanox-partner DDN Storage to find a solution that could both enable its network to support today’s production needs while also support rapidly changing and growing data demands, without breaking the bank.

Working with massive uncompressed files ranging from 1080p to UHD and native 4K can be a drain on network performance and result in extended wait times for editing and rendering videos along with other post-production processes. These issues add up to an outdated data network becoming a major bottleneck and liability at the center of a studio’s work, ultimately resulting in headaches, additional costs, and delays.

The Old Fibre Channel vs. New Ethernet

When deciding between investing in an update or complete overhaul of network infrastructure for a film studio, as with any industry, it comes down to weighing the value of short-term and long-term needs and finding the most cost-effective solution to deliver on today’s demands while setting them up for tomorrow’s as well. For Filmlance, the options were to continue with its Fibre Channel-based StorNext® platform or transition to a new system, including an infrastructure overhaul, capable of scaling for future demands. The studio decided to go with an option somewhere in the middle, choosing MEDIAScaler™ from DDN Storage, which would allow it to continue using most of its existing fiber cabling while breaking away from licensing costs associated with their old platform and integrating next-generation network technologies to support data-heavy activities.

The new system enables Filmlance to utilize an IBM Spectrum Scale parallel file system client and 40/56GbE Mellanox Ethernet network with 4x56GbE redundant switches for high-performance activities, with the goal of eventually moving all functions over to the new architecture as needed. Across its current architecture, Filmlance can now tap into clients with connectivity ranging in speeds from 1Gb/s to 56Gb/s without any permission issues and begin transitioning to a more scalable architecture for all its systems at a speed that fits its budget without compromising on performance where it needs it most.

Filmlance CTO Henrik Cednert explained to DDN Storage in the company’s case study with Mellanox on the project, “It would have been cheaper to continue with StorNext because we already had the supporting infrastructure in place. However, we knew it was time to make the move into something new if we were to meet the constantly growing demand for active storage capacity. Not something we took lightly but we decided that we had to entirely replace the old solution.”

Cednert goes further to discuss the value of Ethernet for performance and cost savings:

“If needed, we can connect with any computer in our IT infrastructure using regular Gigabit Ethernet. This is far more versatile than other solutions where some require additional servers, licenses and a more complicated setup. As a result, the DDN solution is much more cost effective for us. We have everything in one system with less management and much less headache.”

With plans to completely transition its infrastructure to the MEDIAScaler platform for end-to-end workflows, Filmlance sees the future of its industry relying on high-speed network solutions that offer greater interoperability and scalability, which will positively impact a studio’s overall performance and bottom line. To meet rapid developments in how entertainment is delivered, from VR games to increasingly high-definition, large format videos, data networks play one of the most essential roles in facilitating an entertainment company’s success. For companies like Filmlance, the demands they’re seeing are rendering previous-generation solutions archaic, as today’s requirements are now for network technologies that were developed with scaling and growth at their heart.

That being said, in the end, all that really matters is being able to create content that grabs the consumer’s imagination and allows them to be enveloped by the story and experience — something that takes a lot more than it used to!

Supporting Resources

Mellanox Makes Mining of Bitcoin and Other Cryptocurrency Faster and Greener

Unless you’ve been living under a rock (or spending all your time binge-watching TV shows and playing fantasy football), you probably already know that investing in Bitcoin and other cryptocurrencies is all the rage. Bitcoin has gained over 2000 percent in the last year, trading as high as $19,000 recently, and the U.S. Exchanges just approved the trading of Bitcoin futures. Some people like cryptocurrencies because they are not controlled by any government; others value the relative anonymous nature of cryptocurrency ownership; some feel they are immune to inflation because they cannot be “printed” at will by governments; and the rest seem to like them simply as the latest trendy speculative investment.

 

Figure 1: Bitcoin prices have passed $19,000 at times in December 2017.

Bitcoin Relies on Complex Mathematical Hashes

Regardless of why bitcoin is hot, it’s inextricably linked to math. Bitcoin and most other cryptocurrencies are built on blockchain technology, a series of complex math calculations called hashes that are connected in a verifiable chain called a ledger. Each transaction is verified by a hash and added on to previous transactions in the ledger (the blockchain) that allows anyone verify the authenticity of the currency, the current owner, and any transfers (purchases).

Figure 2: McDonalds Hash, $2.50; Bitcoin blockchain hash, up to $19,000 if you solve the right problem first.

How to Mine Bitcoin

Verifying transactions with existing bitcoin is computationally intensive and earns miners a payment. Solving the hashes that it takes to create new bitcoin takes a lot more computation, called “mining,” and generates a much bigger payoff. It requires a lot of calculations and you’re competing with other miners to find the correct hash first. A certain percentage of the new hashes result in gaining a new bitcoin, typically one new bitcoin every ten minutes. The blockchain algorithm adjusts every two weeks, making the hash calculations more complex as more computing power worldwide is devoted to mining. This keeps the discovery rate of new bitcoin roughly constant, and over time, discovering each new bitcoin requires either more time or more computing power. The miners with the most computing power can create new bitcoin faster resulting in an “arms race” to find the fastest way to create the approximately 144 new bitcoins that can be mined per day. Because only some of the hash calculations produce new bitcoin, cryptocurrency mining servers are measured in millions, billions, or trillions of hashes per second (MH/s, BH/s, or TH/s).

A good article in Investopedia explains how bitcoin mining works here.

It’s actually not unlike mining gold during a gold rush. At first, it’s easy to find on the ground or with simple tools, but as more gold is extracted, the harder it becomes to get the rest, with more miners competing for less gold, which is deeper in the ground.

Figure 3: Gold mining before and after parallel processing.

 

Supplying Pickaxes and Shovels to the Miners

In the famous California Gold Rush of 1849, up to 300,000 miners flocked to California from all over the world to seek their fortune. There was an additional influx of people to work the stores, build the railroads, construct buildings, pour the drinks, and wash the clothes. Some found a little gold and some found a lot, but generally only two groups of people became very rich: the large scale industrial miners and those selling tools and supplies to the miners.  A typical Gold Rush Miner could make $20/day in 1850, or $585/day in 2016 dollars. On the other hand, a single egg could sell for $3.00, or $90 in today’s prices. A washer-woman could make a profit of $100 per week ($2,927/week in 2016 dollars) and a shovel cost $36 ($1,083 in 2016 dollars).

Leland Stanford started as a Gold Rush merchant and wholesaler and ended up as a railroad president and robber baron, later becoming California governor then Senator before funding his namesake university in 1891. Levi Strauss also became rich selling dry goods and then durable clothing — including durable riveted denim blue jeans — to miners. As gold mining evolved from easy panning in streams to large-scale industrial operations, the suppliers like Stanford and Strauss still made money selling to the miners.

 

Figure 4: The SF 49ers play with gold-colored helmets in Levi’s Stadium — that’s all connected historically.

 

GPUs, FPGAs, and ASICs Save Energy for “Greener” Cryptocurrency

It turns out the biggest cost for modern bitcoin miners is power. You only pay for the servers once every few years, but they quaff electricity 24x7x365, and the more servers you deploy, the more power you use. One recent estimate is that worldwide bitcoin mining consumes the same amount of electricity as 3 million U.S. homes, or more than many small countries consume. Producing one bitcoin was estimated to cost $2,856 at the start of 2017, rising to $6,611 at the end of 2017, mostly due to electrical costs needed to drive the increasingly complex hash calculations. Thus, bitcoin miners need to balance computing cores with power efficiency,—,that is, which servers will produce the most bitcoin per Watt, not just the most bitcoin per day. Originally, they shopped for processors based on millions of hashes per hour (Mh/s) but now they look for the lowest Watts per millions hashes (W/Mh).

Bitcoin mining with regular x86 CPUs is no longer competitive. Miners are using Graphics Processing Units (GPUs) such as the nVidia Tesla or AMD Radeon HD 5870, which are much better at calculating hashes in parallel. Others use customized FPGAs and ASICs which are much more power-efficient, using less electricity per hash calculations. Going forward, miners might look at new CPU architectures such as Power9, which is more efficient at parallel computing than x86, or at ARM servers, which are known for being very power-efficient.

Mellanox Supplies Picks and Shovels to the Cryptocurrency Gold Rush

As you deploy these processors in large clusters, you need a fast and efficient network to connect them to maximize your chances of solving any particular block of hashes first, and collecting the resulting reward of a new bitcoin. Making your network more efficient makes your cluster more efficient, and Mellanox offers several advantages in this area.

  • Mellanox works closely with Intel, IBM, AMD, nVidia, and Quantum, as well as all the large server vendors, so our adapters can connect X86, Power9, ARM, GPU, FPGA, and ASIC-based servers. When new server CPUs are launched, Mellanox network adapters are often the first ones supported.
  • Mellanox adapters support both Ethernet and InfiniBand at all popular network speeds, including 10, 20, 25, 40, 50, 56, and 100Gb/s. In fact, Mellanox Ethernet adapters were the first to support new speeds such as 40GbE, 25GbE, 50GbE, and 100GbE.
  • Mellanox adapters have ASICs that support smart hardware offloads, making mining servers more efficient because the whole CPU (or GPU or FPGA) can be used to solve hashes instead of devoting part of it to networking and data movement.
  • Mellanox adapters and switches are very low latency, offering fair and predictable performance for very large server and storage clusters.
  • Mellanox adapters and switches are generally the most power efficient available, reducing the mining power costs.

No matter what processor and server platform you choose to mine cryptocurrency, you need a fast and efficient network to connect it all. Mellanox is the best provider of fast, efficient networks, and has been enjoying great success in supporting the mining of cryptocurrency such as Bitcoin and Ethereum, and is well-positioned as a supplier of digital tools to help ambitious cryptocurrency miners going forward.

Resources:

 

 

 

 

IBM Demonstrates NVMe Over Fabrics on InfiniBand with Power9 Servers and PCIe Gen 4

Today, at the AI Summit New York, IBM is demonstrating a technology preview of NVMe over Fabrics using their Power9 servers, Mellanox InfiniBand connectivity, and IBM Flash Storage.

As I mentioned in my blog 3 weeks ago, during the SC17 conference, the IBM FlashSystem 900 array would be an excellent candidate to support NVMe over Fabrics. It is a superfast flash array with very low latency and already supports the SCSI RDMA Protocol (SRP) over InfiniBand connections.

IBM is a strong player in several industries and solutions that require high bandwidth and/or low latency, such as High Performance Computing (HPC), Media and Entertainment, and Database. And of course IBM has been a long-time leader and innovator in Artificial Intelligence (AI).

Figure 1: The IBM FlashSystem 900 features very low latency and is now being demonstrated with NVMe over Fabrics over InfiniBand.

 

 

Using NVMe-oF on InfiniBand to Support AI

One of their demonstrations at the AI Summit is IBM Power AI Vision, which can automatically and quickly recognize and classify objects with image recognition via neural networks. In this case, an IBM AC922 server, running the Power 9 CPU, connects to five FlashSystem 900 storage arrays using networking technology from Mellanox. This includes a Mellanox Switch-IB 2 SB7800 switch, which supports InfiniBand networking at DDR, QDR, FDR, and EDR speeds (20, 40, 56, or 100Gb/s). This tech preview is achieving 41 Gigabytes/second of throughput (23GB/s of reads plus 18GB/s of writes) using a single Mellanox ConnectX-5 dual-port 2x100Gb) adapter in the server.

Showcasing the Power of PCIe Gen 4

This is also one of the first demonstrations of a server connecting to the Mellanox ConnectX adapter using PCIe Gen 4 technology, which can support 2x faster data transfers per lane than PCIe Gen 3.  Mellanox networking technology offers the fastest performance whether on Ethernet or InfiniBand, and includes the first NICs and HBAs to support PCIe Gen 4 slots in servers. Mellanox is shipping end-to-end 100Gb/s Ethernet and InfiniBand solutions today, including adapters, switches, cables and transceivers, with 200Gb/s technology coming soon for both Ethernet and InfiniBand. With PCIe Gen 4, a single adapter can easily support throughput up to 200Gb/s, so it’s no surprise that the fastest storage and server vendors in the world, like IBM, have chosen Mellanox to connect their solutions together and demonstrate NVMe over Fabrics.

Upcoming G2M Webinar about NVMe Over Fabrics

  • To learn more about NVMe over Fabrics technology directions and trends, you can attend an upcoming webinar next Tuesday (December 12, 9am PST or 12pm EST) hosted by G2M, Mellanox, and other leading vendors in the NVMe and NVMe-oF space. Featured speakers include: Howard Marks of DeepStorage, Mike Heumann of G2M, and Rob Davis of Mellanox. You can register for this webinar here: http://bit.ly/2jOJaUz

Supporting Resources:

NVMe Over Fabrics on InfiniBand, the World’s Fastest Storage Network

G2M Research, an analyst that specializes in solid state storage, just held a webinar on October 24th 2017, about NVMe and NVMe over Fabrics (NVMe-oF). In it, they predicted rapid growth in the NVMe market, including rising demand for specialized network adapters, and they named Mellanox as the “Highest Footprint” vendor with the largest share of these adapters. Back in August 2017 at Flash Memory Summit, IT Brand Pulse readers voted Mellanox as the leading provider of NVMe-oF network adapters. Neither of these is any surprise since Mellanox was first to market with 25, 40, 50, and 100GbE adapters and has been a longtime leader in the Remote Direct Memory Access (RDMA) technology that is currently required for NVMe-oF.

However, while most of the news about NVMe-oF have focused on Ethernet (using RoCE), some well-known storage vendors are supporting NVMe-oF over InfiniBand.

NetApp E-Series Supports NVMe-oF on InfiniBand

In September 2017, NetApp announced their new E-5700 hybrid storage array and EF-570 all-flash arrays, which both support NVMe over Fabrics (NVMe-oF) connections to hosts (servers) using EDR 100Gb/s InfiniBand. This made NetApp the first large enterprise vendor to support NVMe over Fabrics to the host, and first enterprise storage vendor to support EDR 100Gb/s InfiniBand. They are also — as far as I know — the first all-flash and hybrid arrays to support three block storage protocols with RDMA: NVMe-oF, iSER, and SRP. Rob Davis wrote about the NetApp announcement in his recent blog.

Figure 1: NetApp EF-570 supports NVMe-oF, iSER and SRP on EDR 100Gb InfiniBand.

Excelero demonstrates NVMe-oF on InfiniBand at SC17

Excelero has been a fast software-defined storage (SDS) innovator in supporting NVMe-oF as both a disaggregated flash arrays or in a hyperconverged configuration. They support both 100Gb Ethernet and EDR 100Gb InfiniBand, and they are demonstrating their solution using InfiniBand in the Mellanox booth #653 at Supercomputing 2017.

Other Systems Likely to Add NVMe-oF on InfiniBand Support

In addition to the publicly declared demonstrations, there are other vendors who already support InfiniBand front-end (host) connections are could add NVMe-oF support fairly easily. For example, the IBM FlashSystem 900 has a long history of supporting InfiniBand host connections and is known for high performance and low latency, even amongst other all-flash arrays. IBM also has a strong history of delivering HPC and technical computing solutions including storage. So it wouldn’t be much of a surprise if IBM decided to add NVMe-oF support over IB to the FlashSystem 900 in the future.

InfiniBand is the World’s Fastest Storage Network

NVMe-oF allows networked access to NVMe flash devices, which themselves are faster and more efficient than SAS- or SATA- connected SSDs. Because it eliminates the SCSI layer, NVMe-oF is more efficient than iSCSI, Fibre Channel Protocol (FCP), or Fibre Channel over Ethernet (FCoE). But with more efficient storage devices and protocols, the underlying network latency becomes more important.

InfiniBand is appealing because it is the world’s fastest storage networking technology. It supports the highest bandwidth (EDR 100Gb/s shipping since 2014) and the lowest latency of any major fabric technology, <90ns port-to-port, which is far lower than 32Gb Fibre Channel and slightly lower than 100Gb Ethernet.  InfiniBand is a lossless network with credit-based flow control and built-in congestion control and QoS mechanisms.

Since the NetApp E-series arrays are very fast — positioned for “Extreme Performance”— and NetApp is targeting high-performance workloads such as analytics, video processing, high performance computing (HPC), and machine learning, it’s no surprise that the product family has long supported InfiniBand and the newest models support EDR InfiniBand.

Likewise, Excelero positions their NVMesh® to meet the demand of the most demanding enterprise and cloud-scale applications, while the IBM FlashSystem 900 is designed to accelerate demanding applications such as online transaction processing (OLTP), analytics database, virtual desktop infrastructure (VDI), technical computing applications, and cloud environments. With their focus on these applications, it makes sense that they support InfiniBand as a host connection option.

Figure 2: The IBM FlashSystem 900 already supports InfiniBand host connections and IBM has promised to add support soon for NVMe technology.

InfiniBand Supports Multi-Lingual Storage Networking

InfiniBand is a versatile transport for storage. Besides supporting NVMe-oF, it supports iSCSI Extensions for RDMA (iSER) and the SCSI RDMA Protocol (SRP). It can also be used for SMB Direct, NFS over RDMA, Ceph, and most non-RDMA storage protocols that run over TCP/IP (using IP-over-IB). One of the innovative aspects of the new NetApp E-5700 and EF-570 is that they are “trilingual” and support any of the three block storage protocols over EDR (or FDR) InfiniBand. The IBM FlashSystem 900 also supports SRP and will presumably become “bilingual” on InfiniBand storage protocols after adding NVMe-oF.

So, whether you are already using SRP for HPC or want to adopt NVMe-oF as the newest and most efficient block storage protocol (or use iSER with NetApp), Mellanox InfiniBand has you covered.

 

 

Figure 3: The Mellanox Switch-IB 2 family supports up to 36 ports at EDR 100Gb/s in a 1U switch or up to 648 ports in a chassis switch, with port-to-port latency below 90 nanoseconds.

 

Cloud, HPC, Media, and Database Customers Drive Demand for EDR InfiniBand

Now, who exactly needs InfiniBand connections, or any type of 100Gb connection to the storage? If most storage customers have been running on 10Gb Ethernet and 8/16Gb Fibre Channel, what would drive someone to jump to 100Gb networking? It turns out many high performance computing (HPC), cloud, media, and database customers need this high level of storage networking performance to connect to flash arrays.

HPC customers are on the cutting edge of pure performance, wanting either the most bandwidth or the lowest latency, or both. Bandwidth allows them to move more data to where it can be analyzed or used. Low latency lets them compute and share results faster. EDR InfiniBand is the clear winner either way, with the highest bandwidth and lowest latency of any storage networking fabric. Several machine learning (ML) and artificial intelligence (AI) applications also support RDMA and perform better using the super low latency of InfiniBand. And the latest servers from vendors such as Dell EMC, HPE, Lenovo, and Supermicro can all be ordered with FDR 56Gb or EDR 100Gb InfiniBand adapters (based on Mellanox technology).

Cloud customers are on the cutting edge of scale, efficiency, and disaggregation, and whatever lets them support more users, more applications, and more VMs or containers in the most efficient way. As they use virtualization to pack more applications onto each physical host, they need faster and faster networking. And as they disaggregate flash storage (by moving it from individual servers to centralized flash pools) to improve efficiency, they need NVMe-oF to access that flash efficiently. While most cloud customers are running Ethernet, there are some who have built their networks on InfiniBand so want InfiniBand-connected storage arrays to power their cloud operations.

Media and Entertainment customers are scrambling to deal with ultra-high definition (UHD) video at 4K and 8K resolutions. 4K cinema video has almost 4.3 more pixels than standard HD TV (4K TV video is slightly lower, having only 4x more pixels than HD TV). While the initial capture and final broadcast use compression, many of the editing, rendering, and special effects steps used to create your favorite TV shows and movies require dealing with uncompressed video streams, often in real-time. These uncompressed 4K streams cannot fit in an 8Gb or 10Gb pipe, and sometimes even exceed what 16Gb FC can do. This has pushed many media production customers to use FDR (56Gb) or EDR (100Gb) InfiniBand, and they need fast storage to match that.

Figure 4: Adoption of 4K and 8K video is driving media and entertainment companies to adopt high-speed networking for storage, including EDR InfiniBand.

Database over InfiniBand may surprise some of you, but it makes perfect sense because database servers need low latency, both between each other and to the storage. The Oracle Engineered Systems (Exadata, Exalogic, Exalytics) are designed around an InfiniBand fabric and Oracle RAC server clustering software supports InfiniBand. Even in the cloud, a majority of financial or e-commerce transactions end up going through a SQL database in the end, and low latency for the database is critical to ensure a smooth online and e-commerce experience.

Leveraging The World’s Fastest Fabric for Storage

InfiniBand is the world’s fastest fabric with 100Gb/s today and 200Gb/s products announced and coming soon. While most of the world’s networked storage deployments are moving to Ethernet, it’s clear that when the fastest possible storage connections with the lowest latency are needed, InfiniBand is often the best choice. With new flash array support for NVMe over Fabrics, IBM and NetApp are supporting the world’s most efficient block storage protocol on top of the world’s fastest storage networking technology, and I expect the result will be many happy customers enjoying superfast storage performance.

Supporting Resources:

 

 

New HPE StoreFabric M-series Switches Power Ethernet Storage Fabric

HPE Launches New Ethernet Switches for Storage

Today, Hewlett Packard Enterprise (HPE) announced their new StoreFabric M-series Ethernet switches, which are built on Mellanox Spectrum switch technology. This is an exciting new product line, specifically designed for storage workloads and ideal for building an Ethernet Storage Fabric (ESF). The switches are perfect for building fast and scalable storage networks for block, object, and file storage, as well as hyper converged infrastructure (HCI). They make it easy to start by connecting a few nodes in a rack then scale up to a full rack, and later, to hundreds of nodes connected across many racks, running at speeds from 1GbE up to 100GbE.

Figure 1: HPE StoreFabric M-series switches are ideal for building an Ethernet Storage Fabric

 

Why HPE Needs an Ethernet Storage Switch

HPE has long sold Fibre Channel SAN switches but this is their first Ethernet switch specifically targeted at storage. It turns out, Ethernet-connected storage is growing much more rapidly than FC-SAN connected storage, and about 80 percent of storage capacity today is well suited for Ethernet (or can only run on Ethernet).  Only 20 percent of storage capacity is the kind of Tier-1 block storage that traditionally goes on FC-SAN, and even most of that block storage can also run on iSCSI or newer block protocols such as iSER (iSCSI RDMA) and NVMe over Fabrics (NVMe-oF, over Ethernet RDMA).

If you look at HPE’s extensive storage lineup, the products which are more focused on Ethernet are growing much faster than those focused on Fibre Channel.

  • The very high-end Enterprise XP are probably growing very slowly and are almost entirely FC or FCoE connected.
  • The high-end 3PAR arrays are growing modestly and are mostly FC-connected (I would guess 75 percent FC today) but their Ethernet connect rate is rising.
  • The HPE Nimble Storage arrays were growing at a robust 28 percent/year when HPE acquired Nimble, and are mostly Ethernet-connected (I’d guess at least 70 percent Ethernet).
  • The HPE Simplivity HCI solution is growing super quickly and is 100 percent Ethernet.

HPE also has key storage software partners who specialize in file storage (like Qumulo), object storage (like Scality), and hyper converged secondary storage (like Cohesity). And HPE servers also get deployed with other HCI or software-defined storage solutions such as VMware VSAN, Ceph, and Microsoft Windows Storage Spaces Direct — all products which require Ethernet networking. So, while Fibre Channel remains important to key HPE customers and storage products, most or all of the growth is in Ethernet-connected solutions. It makes perfect sense for HPE to offer a line of Ethernet switches optimized for Ethernet storage.

 

Figure 2: The HPE M-series switches support many kinds of storage arrays, tiers, and HPE storage partners.

 

There is No Fibre Channel in the Cloud

Currently, about the single most powerful trend in IT is the cloud. Workloads are moving to the public cloud and enterprises are transforming their on-premises IT infrastructure to emulate the cloud to achieve similar cost savings and efficiency gains. All the major cloud providers long ago realized that Fibre Channel is too expensive, too inflexible, and too limited as a storage network for their highly-scalable, super-efficient deployments. Hence, all the public clouds run both compute and storage on Ethernet (except for those that need high performance and efficiency and therefore run on InfiniBand), and large enterprises are following suit. They are deploying more virtualization, more containers, and more hyperconverged infrastructure to increase their flexibility and agility. As enterprises build private and hybrid clouds using HPE storage and servers, it makes sense that they would look for fast, reliable HPE Ethernet switches to power their own cloud deployments.

 

Mellanox Spectrum is Ideal for Storage Networking

Now what kind of Ethernet switch is ideal for storage?  First it must be FAST, meaning high-bandwidth, non-blocking, and with consistently low latency. As noted in my previous blogs, faster storage needs faster networks, especially for all-flash arrays. HPE is the world’s #1 enterprise storage systems vendor according to IDC (IDC Worldwide Quarterly Enterprise Storage Systems Tracker, 2Q 2017) so we can assume they sell more flash storage than just about anyone else. These faster systems need faster connections. While Fibre Channel recently reached 32Gb/s, there are already all-flash arrays on the market making full use of 100Gb Ethernet. And 100GbE delivers 3x the performance of 32Gb FC at 1/3rd the price — at a 9x advantage in price-performance.

The trend amongst top storage vendors is also to support NVMe SSDs and the NVMe over Fabrics protocol, which requires higher bandwidth, lower latency, and an RDMA-capable network. HPE Servers — one of the world’s most popular server brands — already support 25, 40 and 100GbE networking (most often with Mellanox adapters), and we can assume that HPE Storage flash arrays will support faster Ethernet speeds such as 25, 40, 50, or 100GbE in the future.

This means an Ethernet storage switches needs to be ready to support these faster speeds  with high-bandwidth but also with features like: RDMA over Converged Ethernet (RoCE), non-blocking, ZeroPacketLoss, consistently-low latency, etc. Mellanox Spectrum switches, and now HPE StoreFabric M-series switches, are best in class in all these categories.

 

What is an Ethernet Storage Fabric?

Beyond performance, the ideal Ethernet storage switch should offer Flexibility and Efficiency. That means efficient form-factors that support many ports in just one RU of space. It should support all Ethernet speeds and allow easy upgrades to port speeds, port counts, features, and the network architecture. And, of course, it should have low power consumption, be easy to manage, and be affordable, with flexible pricing and financing.

The HPE StoreFabric M-series switches combine the best of Mellanox and HPE innovation and technology. The unique form factors allow high-availability and up to 128 ports (at 10/25GbE speeds) in one RU of space. The switches deliver consistently low latency across all speeds and port combinations, letting different server and storage nodes use different speeds without any performance penalty. They support speeds up to 100GbE and have the best support for Ethernet RDMA, traffic isolation, security, telemetry, and Quality of Service (QoS).

Figure 3: HPE StoreFabric M-series Switches Support an Ethernet Storage Fabric

 

StoreFabric M-series Make Your Storage Network Future-Proof

Thanks to flexible licensing, customers can start with as few as 8 ports per switch and upgrade the port count as needed. The same switches can be used to grow storage networks from one rack to many racks with hundreds of servers and ports, without needing to discard or replace any of the original switches. The port speeds can be upgraded easily from 10 to 25 to 40/50 to 100GbE speeds and the switch is ready to supported advanced storage protocols.

Even better, the M-series switches are designed to allow software upgrades and future integrations with specific storage, server, or cloud management tools. This means your network infrastructure investment in HPE M-series switches today will support multiple generations of HPE servers and storage arrays, making your storage network future-proof.

To learn more about the amazing new HPE StoreFabric M-series switches, contact your HPE channel partner or HPE sales rep today!

Figure 4: Upgradable port speeds, network architecture, and switch software make the HPE M-series switches future-proof.

 

Supporting Resources:

 

 

The Best Flash Array Controller Is a System-on-Chip called BlueField

As the storage world turns to flash and flash turns to NVMe over Fabrics, the BlueField SoC could be the most highly integrated and most efficient flash controller ever. Let me explain why.

The backstory—NVMe Flash Changes Storage

Dramatic changes are happening in the storage market. This change comes from NVMe over Fabrics, which comes from NVMe, which comes from flash. Flash has been capturing more and more of the storage market. IDC reported that in Q2 2017, the all-flash array (AFA) revenue grew 75% YoY while the overall external enterprise storage array market was slightly down. In the past this flash consisted of all SAS and SATA solid state drives (SSDs), but flash and SSDs have long been fast enough that the SATA and SAS interfaces imposed bandwidth bottlenecks and extra latency.

 

Figure 1: SATA and SAS controllers can cause a bottleneck and result in higher latency.

 

The SSD vendors developed the Non-volatile memory Express or NVMe standard and commands (version 1.0 released March 2011), which run over a PCIe interface. NVMe allows higher throughput, up to 20Gb/s per SSD today (and more in the near future) and lower latency.  It eliminates the SAS/SATA controllers and requires PCIe connections, typically 4 PCIe Gen 3 lanes per SSD.  Many servers deployed with local flash now enjoy the higher performance of NVMe SSDs.

 

How to Share Fast SSD Goodness

But local flash deployed this way is “trapped in the server” because each server can only use its own flash. Different servers need different amounts of flash at different times, but with a local model you must overprovision enough flash in each server to support the maximum that might be needed, even if you need the extra flash for only a few hours at some point in the future. The answer over the last 20 years has been to centralize and network the storage using iSCSI, Fibre Channel Protocol, iSER (iSCSI over RDMA), or NAS protocols like SMB and NFS.

But these all use either SCSI commands or file semantics and were not optimized for flash performance, so they can deliver good performance but not the best possible performance. As a result the NVMe community, including Mellanox, created NVMe over Fabrics (NVMe-oF) to allow fast, efficient sharing of NVMe flash over a fabric. It allows the lean and efficient NVMe commands to operate across an RDMA network with protocols like RoCE and InfiniBand. And it maintains the efficiency and low latency of NVMe while allowing sharing, remote access, replication, failover, etc.  A good overview of NVMe over Fabrics is in this YouTube video:

 

Video 1: An overview of how NVMe over Fabrics has Evolved

NVMe over Fabrics Frees the Flash But Doesn’t Come Free

Once  NVMe-oF frees the Flash from the server, you now need an additional CPU to run NVMe commands in a Just-A-Bunch-of-Flash (JBOF) box, plus more CPU power if it’s a storage controller running storage software. You need DRAM to store the buffers and queues. You need a PCIe switch to connect to the SSDs. And you need rNICs that can handle RDMA at high enough speeds to support all the fast NVMe SSDs. In other words, you have to build a complete server design with enhanced internal and external connectivity to support this faster storage. For a storage controller this is not unusual, but for a JBOF it’s more complex and costly than what they’re accustomed to doing with SAS or SATA HBAs and expanders—that don’t require CPUs, DRAM, PCIe switches, or rNICs.

Also, since NVMe SSDs and the NVMe over Fabrics protocol are inherently low latency, the latency of everything else in the system—software, network, HBAs, cache or DRAM access, etc., becomes more prominent and reducing latency in those areas becomes more critical.

A New SoC Is the Most Efficient Way to Drive NVMe-oF

Fortunately there is a new way to build NVMe-oF systems: a single chip that provides everything needed, other than the SSDs and the DRAM DIMMs; it is the Mellanox BlueField.  It includes:

  • ConnectX-5 high-speed NIC (up to 2x100Gb/s ports, Ethernet or InfiniBand),
  • Up to 16 ARM A72 (64-bit) CPU cores,
  • A built-in PCIe switch (32 lanes at Gen3/Gen4),
  • DRAM controller & coherent cache
  • A fast mesh fabric to connect it all

 

Figure 2: BlueField (logical design illustration) includes networking, CPU cores, cache, DRAM controllers, and a PCIe switch all on one chip.

 

The embedded ConnectX-5 delivers not just 200Gb/s of network bandwidth but all the features of ConnectX-5, including RDMA and NVMe protocol offloads. This means the NVMe-oF data traffic can go directly from SSD to NIC (or NIC to SSD) without interrupting the CPU. It also means overlay network encapsulation (like VXLAN), virtual switch features (such as OVS), erasure coding, T10 data integrity factor signatures, and stateless TCP offloads can all be processed by the NIC without involving the CPU cores.  The CPU cores remain free to run storage software, security, encryption, or other functionality.

 

The fast mesh internal fabric enables near-instantaneous data movement between the PCIe, CPU, cache and networking elements as needed, and operates much more efficiently than a classic server design where traffic between the SSDs and NIC(s) must traverse the PCIe switch and DRAM multiple times for each I/O. With this design, NVMe-oF data traffic queues and buffers can be handled completely in the on-chip cache and doesn’t need to go to the external DRAM, which is only needed if additional storage functions running on the CPU cores are applied to the data. Otherwise the DRAM can be used for control plane traffic, reporting, and management. The PCIe switch supports up to 32 lanes of both Gen3 or Gen4, so it can transfer more than 200Gb/s of data to/from SSDs and is ready for the new PCIe Gen4-enabled SSDs expected to arrive in 2018. (PCIe Gen4 can transfer 2x more traffic per lane than PCIe Gen3.)

BlueField is the FIRST SoC to include all these features and performance, making it uniquely well-suited to control flash arrays, in particular NVMe-oF arrays and JBOFs.

 

BlueField Is the Most Integrated NVMe-oF Solution

We’ve seen that in the flash storage world, performance is very important. But simplicity of design and controlling costs are also important. By combining all the components of a NVMe-oF server into a single chip, BlueField makes the flash array design very simple and lowers the cost—including allowing a smaller footprint and lower power consumption.

Figure 3: BlueField (logical design illustration) includes networking, CPU cores, cache, DRAM controllers, and a PCIe switch all on one chip.

 

Vendors Start Building Storage Solutions Based on BlueField

Not surprisingly, key Original Design Manufacturers (ODMs) and storage Original Equipment Manufacturers (OEMs) are already designing storage solutions based on BlueField SoC. Mellanox is also working with key partners to create more BlueField solutions for network processing, cloud, security, machine learning, and other non-storage use cases. Mellanox has created a BlueField Storage Reference Platform that can handle many NVMe SSDs and serve them up using NVMe over Fabrics using BlueField. This is the perfect development and reference platform to help customers and partners test and develop their own BlueField-powered storage controllers and JBOFs.

Figure 4: The BlueField Reference System helps vendors and partners quickly develop BlueField-based storage systems.

 

BlueField is the Best Flash Array Controller

The optimized performance and tight integration of all the components needed, makes BlueField the perfect flash array controller, especially for NVMe-oF storage arrays and JBOFs. Designs using BlueField will deliver more flash performance at lower cost and using less power than standard server-based designs.

You can see the BlueField SoC and BlueField Storage Reference Platform this week (August 8-10) at Flash Memory Summit, in the Santa Clara Convention Center, in the Mellanox booth #138.

 

Supporting Resources:

 

 

The Ideal Network for Containers and NFV Microservices

Containers are the New Virtual Machine

Containers represent a hot trend in cloud computing today. They allow virtualization of servers and portability of applications minus the overhead of running a hypervisor on every host and without a copy of the full operating system in every virtual machine. This makes them more efficient than using full virtual machines. You can pack more applications on each server with containers than with a hypervisor.

Figure 1: Containers don’t replicate the entire OS for each application so have less overhead than virtual machines. Illustration courtesy of Docker, Inc. and RightScale, Inc.

 

Containers Make it Easy to Convert Legacy Appliances Into Microservices

Because they are more efficient, containers also make it easier to convert legacy networking appliances into Virtualized Network Functions (VNF) and into microservices. It’s important to understand that network function virtualization (NFV) is not the same as re-architecting functions as microservices, but that the two are still highly complementary.

 

Figure 2: Docker Swarm and Kubernetes are tools to automate deployment of containers. Using containers increases IT and cloud flexibility but puts new demands on the network.

 

The Difference Between Microservices and Plain Old NFV

Strictly speaking, NFV simply replaces a dedicated appliance with the same application running as a virtual machine or container. The monolithic app remains monolithic and must be deployed in the same manner as if it were still on proprietary hardware, except it’s now running on commercial off the shelf (COTS) servers. These servers are cheaper than the dedicated appliances but performance is often slower, because generic server CPUs generally are not great at high-speed packet processing or switching.

Microservices means disaggregating the parts of a monolithic application into many small parts that can interact with each other and scale separately. Suppose my legacy appliance inspects packets, routes them to the correct destination, and analyzes suspicious traffic. As I deploy more appliances, I get these three capabilities in exactly the same ratio, even though one particular customer (or week, or day) might require substantially more routing and very little analysis, or vice versa. However, if I break my application into specific components, or microservices that interoperate with each other, then I can scale only the services that are needed. Deploying microservices in containers means it’s easy to add, reduce, or change the mix and ratio of services running from customer to customer, or even hour to hour. It also makes applications faster to deploy and easier to develop and update, because individual microservices can be designed, tested, deployed or updated quickly without affecting all the other services.

So, NFV moves network functions from dedicated appliances to COTS servers and microservices disaggregates monolithic functions into scalable components. Doing both gives cloud service providers total flexibility in choosing which services are deployed and what hardware is used. But, one more critical element must be considered in the quest for total infrastructure efficiency—NFV optimized networking.

Figure 3: Plain NFV uses monolithic apps on commodity servers. Microservices decomposes apps into individual components that can be scaled separately.

 

Microservices and Containers Require the Right Network Infrastructure

When you decompose monolithic applications into microservices, you place greatly increased demand on the network. Monolithic apps connect their functions within one server so there is little or no east-west traffic — all traffic is north-south to and from the clients or routers. But, an app consisting of disaggregated microservices relies on the network for inter-service communication and can easily generate several times more east-west traffic than north-south traffic. Much of this traffic can even occur between containers on the same physical host, thereby taxing the virtual switch running in host software.

Figure 4: Changing to a microservices design allows flexibility to deploy exactly the services that are needed but greatly increases east-west network traffic, mandating the use of robust and reliable switches.

Moving to COTS servers also poses a performance challenge because the proprietary appliances use purpose-built chips to accelerate their packet processing, while general purpose X86 CPUs require many cycles to process packet streams, especially for small packets.

The answer to both challenges is deploying the right networking hardware. The increased east-west traffic demands a switch that is not only fast and reliable, but able to handle micro-bursts of traffic while also fairly allocating performance across ports. Many Ethernet switches use merchant silicon that only delivers the advertised bandwidth for larger packet sizes, or only when a certain combinations of ports are used. They might cause an unexpected packet drop under load or switch from cut-through networking to store-and-forward networking, which will greatly increase network latency. The main problem with these switches is that performance becomes unpredictable — sometimes it’s good and sometimes it’s bad, and this makes supporting cloud service level agreements impossible. On the other hand, choosing the right switch ensures good throughput and low latency across all packet sizes and port combinations, which also eliminates packet loss during traffic microbursts.

Figure 5: Mellanox Spectrum has up to 8x better microburst absorption capability than the Broadcom Tomahawk silicon used in many other switches. Spectrum also delivers full rated bandwidth at all packet sizes without any avoidable packet loss.

Separately from the switch, an optimized smart NIC such as the Mellanox ConnectX®-4 includes Single Route I/O Virtualization (SRIOV) and an internal Ethernet switch or, eSwitch, to accelerate network functions. These features let each container access the NIC directly and can offload inter-container traffic from the software virtual switch using an Open vSwitch (OVS) offload technology called ASAP2. These smart NICs also offload the protocol translation for overlay networks—like VXLAN, NVGRE, and Geneve, which are used to provide improved container isolation and mobility. These features and offloads greatly accelerate container networking performance while reducing the host’s CPU utilization. Faster networking plus more available CPU cycles enables more containers per host, improving cloud scalability and reducing costs.

Figure 6: ASAP2 offloads packet processing from the software vSwitch to a hardware-accelerated eSwitch in the NIC, greatly accelerating container network performance.

 

Medallia Deploys Microservices Using Containers

Medallia provides a great case study of a modern cloud services provider that has embraced containers and advanced networking, in order to deliver Customer Feedback Management as Software-as-a-Service (SaaS). Medallia enables companies to track and improve their customers’ experiences. Every day, Medallia must capture and analyze online and social media feedback from millions of interactions and deliver real-time analysis and reporting, including personalized dashboards to thousands of their customers’ employees.  Medallia wanted to run their service on commodity hardware using open standards and fully-automated provisioning. They also wanted full portability of any app, service, or networking function, making it easy to move, replace, or relaunch any function on any hardware.

 

To accomplish all this, they designed a software-defined, scalable cloud infrastructure using microservices and containers on the following components:

  • Docker for container management
  • Aurora, Mesos, and Bamboo for automation
  • Ceph for storage
  • Ubuntu Linux for compute servers and Cumulus Linux for networking
  • Mellanox ConnectX-4 Lx 50GbE adapters
  • Mellanox Spectrum switches running Cumulus Linux (50GbE to servers, 100GbE for aggregation)

Figure 7 and Video 1: Medallia uses containers, Cumulus Linux, and Ceph running on Mellanox adapters and switches to deliver a superior cloud SaaS to their customers.

 

Medallia found that using end-to-end Mellanox networking hardware to underlay their containers and microservices resulted in faster performance and a more reliable network. Their Ceph networked storage performance matched that of their local storage, and they were able to automate network management tasks and reduce the number of network cables per rack. All of this enables Medallia to deliver a better SaaS to their cloud customers, who, in turn, learn how to be better listeners and vendors to their own retail customers.

Mellanox is the Container Networking Company

The quest for NFV and containerization of microservices is a noble one that increases flexibility and lowers hardware costs. However, to do this correctly, cloud service providers need networking solutions like Mellanox ConnectX-4 adapters and Spectrum switches. Using the right network hardware ensures fast, reliable and secure performance from containers and VNFs, making Mellanox the ideal NFV and Container Networking Company.

Supporting Resources:

 

 

 

Excelero Unites NVMe Over Fabrics With Hyper-Converged Infrastructure

Two Hot IT Topics Standing Alone, Until Now…

Two of the hottest topics and IT trends right now are hyper-converged infrastructure (HCI) and NVMe Over Fabrics (NVMe-oF).  The hotness of HCI is evident in the IPO of Nutanix in September and HPE’s acquisition of Simplivity in January 2017. The interest in NVMe-oF has been astounding with all the major storage vendors working on it and all the major SSD vendors promoting it as well.

But the two trends have been completely separate—you could do one, the other, or both, but not together in the same architecture. HCI solutions could use NVMe SSDs but not NVMe-oF, while NVMe-oF solutions were being deployed either as separate, standalone flash arrays or NVMe flash shelves behind a storage controller. There was no easy way to create a hyper-converged solution using NVMe-oF.

 

Excelero NVMesh Combines NVMe-oF with HCI

Now a new solution launched by Excelero combines the low latency and high throughput of NVMe-oF with the scale-out and software-defined power of HCI. Excelero does this with a technology called NVMesh that takes commodity server, flash, and networking technology and connects it in a hyper-converged configuration using an enhanced version of the NVMe-oF protocol. With this solution, each node can act both as an application server and as a storage target, making its local flash storage accessible to all the other nodes in the cluster. It also supports a disaggregated flash model so customers have a choice between scale-out converged infrastructure and a traditional centralized storage array.

Figure 1: Excelero NVMesh combines NVMe-oF with HCI, much like combining peanut butter and chocolate into one tasty treat).

 

 

Remote Flash Access Without the Usual CPU Penalties

NVMesh creates a virtualized pool of block storage using the NVMe SSDs on each server and leverages a technology called Remote Direct Drive Access (RDDA) to let each node access flash storage remotely.   RDDA itself builds on top of industry-standard Remote Direct Memory Access (RDMA) networking to maintain the low latency of NVMe SSDs even when accessed over the network fabric.  The virtualized pools allow several NVMe SSDs to be accessed as one logical volume by either local or remote applications.

In a traditional hyper-converged model, the storage sharing consumes some part of the local CPU cycles, meaning they are not available for the application. The faster the storage and the network, the more CPU is required to share the storage. RDDA avoids this by allowing the NVMesh clients to directly access the remote storage without interrupting the target node’s CPU. This means high performance—whether throughput or IOPS—is supported across the cluster without eating up all the CPU cycles.

 

Recent testing showed a 4-server NVMesh cluster with 8 SSDs per server could support several million 4KB IOPS or over 6.5GB/s (>50Gb/s)—very impressive results for a cluster that size.

Figure 2: NVMesh leverages RDDA and RDMA to allow fast storage sharing with minimal latency and without consuming CPU cycles on the target. The control path passes through the management module and CPUs but the data path does not, eliminating potential performance bottlenecks.

 

Integrates with Docker and OpenStack

Another feature NVMesh has over the standard NVMe-oF 1.0 protocol is that it supports integration with Docker and OpenStack. NVMesh includes plugins for both Docker Persistent Volumes and Cinder, which makes it easy to support and manage container and OpenStack block storage. In a world where large clouds increasingly use either OpenStack or Docker, this is a critical feature.

Figure 3: Excelero’s NVMesh includes plug-ins for both Docker and OpenStack Cinder, making it easy to use it for both container and cloud block storage.

 

 

Another Step Forward in the NVMe-oF Revolution

The launch of Excelero’s NVMesh is an important step forward in the ongoing revolution of NVMe over Fabrics. The open source solution supports high performance but only with a centralized storage solution and without many important storage features. The NVMe-oF array solutions offer a proven appliance solution but some customers want a software-defined storage option built on their favorite server hardware.  Excelero offers them all of these features together: hyper-converged infrastructure, NVMe over Fabrics technology, and software-defined storage.

 

Supporting Resources:

Storage Predictions for 2017

Looking at what’s to come for storage in 2017, I find three simple and easy predictions which lead to three more complex predictions.  Let’s start with the easy ones:

  • Flash keeps taking over
  • NVMe over Fabrics remains the hottest storage technology
  • Cloud continues to eat the world of IT

 

Flash keeps taking over

Every year, for the past four years, has been “The Year Flash Takes Over” and every year flash owns a growing minority of storage capacity and spend, but it’s still in the minority. 2017 is not the year flash surpasses disk in spending or capacity — there’s simply not enough NAND fab capacity yet, but it is the year all-flash arrays go mainstream. SSDs are now growing in capacity faster than HDDs (15TB SSD recently announced) and every storage vendor offers an all-flash flavor. New forms of 3D NAND are lowering price/TB on one side to compete with high capacity disks while persistent memory technologies like 3D-XPoint (while not actually buillt on NAND flash) are increasing SSD performance even further above that of disk. HDDs will still dominate low price, high-capacity storage for some years, but are rapidly becoming a niche technology.

1

Figure 1: TrendFocus 2015 chart shows worldwide hard drive shipments have fallen since 2010. Flash is one major reason, cloud is another.

 

According to IDC (Worldwide Quarterly Enterprise Storage Systems Tracker, September 2016) in Q2 2016 the all-flash array (AFA) market grew 94.5% YoY while the overall enterprise storage market grew 0%, giving AFAs 19.4% of the external (outside the server) enterprise storage systems market. This share will continue to rise.

2

Figure 2: Wikibon 2015 forecast predicts 4-year TCO of flash storage dropped below that of hard disk storage in 2016. 

 

NVMe over Fabrics (NVMe-oF) remains the hottest storage technology

It’s been a hot topic since 2014 and it’s getting hotter, even though production deployments are not yet widespread. The first new block storage protocol in 20 years has all the storage and SSD vendors excited because it makes their products and the applications running on them work better.  At least 4 startups have NVMe-oF products out with POCs in progress, while large vendors such as Intel, Samsung, Seagate, and Western Digital are demonstrating it regularly. Mainstream storage vendors are exploring how to use it while Web 2.0 customers want it to disaggregate storage, moving flash out of each individual server into more flexible, centralized repositories.

It’s so hot because it helps vendors and customers get the most out of flash (and other non-volatile memory) storage. Analyst G2M, Inc. predicts the NVMe market will exceed $57 Billion by 2020, with a compound annual growth rate (CAGR) of 95%. They also claim say 40% of AFAs will use NVMe SSDs by 2020, and hundreds of thousands of those arrays will connect with NVMe over Fabrics.

3

Figure 3: G2M predicts incredibly fast growth for NVMe SSDs, servers, appliances, and NVMe over Fabrics.

 

Cloud continues to eat the world of IT 

Nobody is surprised to hear cloud is growing faster than enterprise IT. IDC reported cloud (public + private) IT spending for Q2 2016 grew 14.5% YoY while traditional IT spending shrank 6% YoY. Cloud offers greater flexibility and efficiency, and in the case of public cloud the ability to replace capital expense investments with a pure OpEx model.

It’s not a panacea, as there are always concerns about security, privacy, and speed of access. Also, larger customers often find that on-premises infrastructure — often set up as private cloud — can cost less than public cloud in the long run. But there is no doubting the inexorable shift of projects, infrastructure, and spending to the cloud. This shift affects compute (servers), networking, software, and storage, and drives both cloud and enterprise customers to find more efficient solutions that offer lower cost and greater flexibility.

4

Figure 4: IDC Forecasts cloud will consume >40% of IT infrastructure spending by 2020. Full chart available at:  http://chartchannel.icharts.net/chartchannel/worldwide-cloud-it-infrastructure-market-forecast-deployment-type-2015-2020-shares

 

OK Captain Obvious, Now Make Some Real Predictions!

Now let’s look at the complex predictions which are derived from the easy ones:

  • Storage vendors consolidate and innovate
  • Fibre Channel continues its slow decline
  • Ceph grows in popularity for large customers
  • RDMA becomes more prevalent in storage

 

Traditional storage vendors consolidate and innovate

Data keeps growing at over 30% per year but spending on traditional storage is flat. This is forcing vendors to fight harder for market share by innovating more quickly to make their solutions more efficient, flexible, flash-focused, and cloud-friendly. Vendors that previously offered only standalone arrays are offering software-defined options, cloud-based storage, and more converged or hyper-converged infrastructure (HCI) options. For example, NetApp offers options to replicate or back up data from NetApp boxes to Amazon Web Services, Dell/EMC HDS, and IBM all sell converged infrastructure racks. In addition, startup Zadara Storage offers enterprise storage-as-a-service running either in the public cloud or as on-premises private cloud.

Meanwhile, major vendors all offer software versions of some of their products instead of only selling hardware appliances. For example, EMC ScaleIO, IBM Spectrum Storage, IBM Cloud Object Storage (formerly CleverSafe), and NetApp ONTAP Edge are all available as software that runs on commodity servers.

The environment for flash startups is getting tougher because all the traditional vendors now offer their own all-flash flavors. There are still startups making exciting progress in NVMe over Fabrics, object storage, hyper-converged infrastructure, data classification, and persistent memory, but only a few can grow into profitability on their own. 2017 will see a round of acquisitions as storage vendors who can’t grow enough organically look to expand their portfolios in these areas.

 

Fibre Channel Continues its Downward Spiral

One year ago I wrote a blog about why Fibre Channel (FC) is doomed and all signs (and analyst forecasts) point to its continued slow decline. All the storage trends around efficiency, flash, performance, big data, Ceph, machine learning, object storage, containers, HCI, etc. are moving against Fibre Channel. (Remember the “Cloud Eats the World” chart above? They definitely don’t want to use FC either.) The only thing keeping FC hopes alive is the rapid growth of all-flash arrays, which deploy mostly FC today because they are replacing legacy disk or hybrid FC arrays. But even AFAs are trending to using more Ethernet and InfiniBand (occasionally direct PCIe connections) to get more performance and flexibility at lower cost.

The FC vendors know the best they can hope for is to slow the rate of decline, so all of them were betting on growing their Ethernet product lines. More recently the FC vendors (Emulex, QLogic, Brocade) have been acquired by larger companies, but not as hot growth engines but rather so the larger companies can milk the cash flow from the expensive FC hardware before their customers convert to Ethernet and escape.

 

Ceph grows in Popularity for Large Customers

Ceph — both the community version and Red Hat Ceph Storage — continues to gain fans and use cases. Originally seen as suited only for storing big content on hard drives (low-cost, high-capacity storage), it’s now gained features and performance making it suitable for other applications. Vendors like Samsung, SanDisk (now WD), and Seagate are demonstrating Ceph on all-flash storage, while Red Hat and Supermicro teamed up with Percona to show Ceph works well as database storage (and is less expensive than Amazon storage for running MySQL).  I wrote a series of blogs on Ceph’s popularity, optimizing Ceph performance, and using Ceph for databases.

Ceph is still the only storage solution that is software-defined, open source, scale-out and offering enterprise storage features (though Lustre is approaching this as well). Major contributors to Ceph development include not just Red Hat but also Intel, the drive/SSD makers, Linux vendors (Canonical and SUSE), Ceph customers, and, of course, Mellanox.

In 2016, Ceph added features and stability to its file/NAS offering, CephFS, as well as major performance improvements for Ceph block storage. In 2017, Ceph will improve performance, management, and CephFS even more while also enhancing RDMA support. As a result, its adoption grows beyond its traditional base to add Telcos, cable companies, and large enterprises who want a scalable software-defined storage solution for OpenStack.

 

 

RDMA More Prevalent in Storage

RDMA, or Remote Direct Memory Access, has actually been prevalent in storage for a long time as a cluster interconnect and for HPC storage. Just about all the high-performance scale-out storage products use Mellanox-powered RDMA for their cluster communications — examples include Dell FluidCache for SAN, EMC XtremIO, EMC VMAX3, IBM XIV, InfiniDat, Kaminario, Oracle Engineered Systems, Zadara Storage, and many implementations of Lustre and IBM Spectrum Scale (GPFS).

The growing use of flash media and intense interest in NVMe-oF are accelerating the move to RDMA. Faster storage requires faster networks, not just more bandwidth but also lower latency, and in fact the NVMe-oF spec requires RDMA to deliver its super performance.

5

Figure 5: Intel presented a chart at Flash Memory Summit 2016 showing how the latency of storage devices is rapidly decreasing, leading to the need to decrease software and networking latency with higher-speed networks (like 25GbE) and RDMA.

In addition to the exploding interest in NVMe-oF, Microsoft has improved support for RDMA access to storage in Windows Server 2016, using SMB Direct and Windows Storage Spaces Direct, and Ceph RDMA is getting an upgrade. VMware has enhanced support for iSER (iSCSI Extensions for RDMA) in VSphere 2016 and more storage vendors like Oracle (in tape libraries) and Synology have added iSER support to enable accelerated client access. On top of this, multiple NIC vendors (not just Mellanox) have announced support for RoCE (RDMA over Converged Ethernet) on 25, 40, 50, and 100Gb Ethernet speeds. These changes all mean more storage vendors and storage deployments will leverage RDMA in 2017.

 

So Let’s Get This Party Started

2017 promises to be a super year for storage innovation. With technology changes, disruption, and consolidation, not every vendor will be a winner and not every storage startup will find hockey-stick growth and riches, but it’s clear the storage hardware and software vendors are working harder than ever, and customers will be big winners in many ways.