There is no doubt that over the past couple of years, in a relatively short period of time, Microsoft became the worldwide leader Software-as-a-Service (SaaS) cloud provider. Recent data shows that Microsoft continues to grow and in addition to the existing 30 locations, four new locations are coming soon.
There is also no doubt that scalability and efficiency were among the top priorities of Microsoft’s hardware team that was originally chartered to build the Azure Cloud infrastructure. Those two features are key for any cloud and have a direct impact on the quality-of-Service (QoS) and on the Return on the investment (ROI). As such, choosing the right architecture and solutions are key to establishing a sustainable differentiation that is critical as it has a direct impact on a Company’s ability to meet business goals.
In addition to easy scalability, which is a requirement that is mostly driven by the tremendous growth of storage which is doubling every two years and expected to grow from 4.4 Zeta Bytes in 2013 to over 44 Zeta Bytes in 2020, the need to analyze the data, in real-time, is driving the need to reduce latency and increase the throughput of infrastructure using a scale-out based architecture. This, in turn, will boost data analytics and greatly enhance the interactive user experience.
In order to achieve this goal, the Microsoft team made a decision to build an architecture and use technologies that offload, as much as possible, network functions to the networking subsystems, and chose to use Mellanox’s 40Gb/s efficient Ethernet product in its Azure’s mission critical worldwide deployments. This decision has been proven to be a sound one already as it have been preventing bottlenecks that are associated with waiting for the CPU to complete a job. This, of course, improves, the overall cloud efficiency, QoS and ROI.
One of the best examples of efficient offload is the use of RDMA enabled networks vs. TCP/IP. There are numerous papers and videos that have been published already by Microsoft that validate the higher efficiency that RDMA enables, including Jose Barreto’s video. In the video, Barreto using Mellanox ConnectX-4® 100GbE to compare the performance of TCP/IP vs RDMA (Ethernet vs. RoCE) and clearly shows that RoCE delivers almost two times higher bandwidth and two times lower latency than Ethernet at 50 percent of the CPU utilization required for a data communication task (More data is can be found at: Enabling Higher Azure Stack Efficiency – Networking Matters).
Offloading the CPU doesn’t only free up more CPU cycles to run applications, but also eliminates computational bottlenecks that have to do with the CPU’s ability to process the data fast enough and to feed the network at the maximum wire speed. This is why Azure architects decided to use RoCE connectivity in Azure storage, because it eliminates the typical bottlenecks that are associated with accessing storage.
One of most impressive and convincing benchmarks that shows the differentiation that Mellanox’s RoCE solutions enables is a record-breaking benchmark performed over a 12-node clusters using HPE’s DL380 G9 servers, each with four Micron 9100MAX NVMe storage cards and dual Mellanox ConnectX-4 100GbE NICs, all connected by Mellanox’s Spectrum 100GbE switch and LinkX™ cables. The cluster delivered an astonishing sustainable 1.2Tb/s bandwidth across application-to-application communication, which, of course, enables higher data center efficiency. A separate benchmark, using a smaller cluster of only four nodes to run MS SQL, achieved a record performance of 2.5 million transactions per minute (using SOFS architecture).
RDMA isn’t the only offload engine available today. Another good offload example is the three times higher application efficiency that NVRGE offload enables in a Cloud Platform System (CPS) vs. running over the CPU. However, many others are available over Mellanox’s ConnextX family of NICs, including Erasure Coding, VXLAN, Geneve, Packet Pacing and others.
However, Microsoft’s Azure team hasn’t stopped just with using advanced NICs. Modern hyper-scale datacenters that have hundreds of thousands to millions of servers require much higher levels of loading, which drives the need for an FPGA in the IO path, between the server and the switches. Having such a programmable device as a, “bump-in-the-wire” function, can be used to offload CPU-hungry network functions from the CPU to the FPGA, and enable data communication at the wire speed. A good example is the offload of IPSEC or TLS that can run over Mellanox’s Innova IPsec 4 Lx EN Adapter which enables four times improvement in crypto performance, and frees the CPU cores to run the real users’ applications.
Security is just one of many functions that can be offloaded from the CPU to the network. Other functions such as software defined networks (SDN) controller or deep machine learning algorithms can use the FPGA to maximize the scalability and agility of the infrastructure, without the need to replace the existing deployment. It is also expected that these functions have been, or will be available, in standard NICs including in existing ConnectX-5 and the upcoming ConnectX-6 which will enable in-network computing. In addition, a new class of light, agile and fast co-processors, such as Mellanox’s BlueField are emerging, which will free the CPU to drive data in a wire speed of 25G/s or 50Gb/s and soon at 100Gb/s. All of these are expected to drive the clouds’ efficiency to a new record heights.