We’re living at a fascinating point in time for data center development. Modern data centers continue to evolve from physical to virtual, cloud infrastructure expand from public, private and into hybrid deployments and we are witnessing a shift from spinning storage media to an age of Flash. Moreover, new types of compute platforms allowing for a more open data center models are ushering in an era of commodity architecture which is making its way into the cloud providers, service providers, and even enterprise data centers.
Hyperconvergence is one of the models that is adopting this architectural shift. Moving away from proprietary and expensive storage arrays to open standard compute & storage architectures built around off-the-shelf commodity servers. Organizations can now utilize commodity architecture to implement the latest hyperconverged solutions that can compete with large storage arrays without breaking the bank.
However, are these solutions capable of keeping up with the performance of typical storage platforms? Yes, in fact, Microsoft recently set a performance record showcasing its latest Storage Spaces Direct (S2D) edition in Windows Server 2019 in a hyperconverged solution that demonstrated an impressive 13.7M IOPs.
Noticing the performance gap between Microsoft’s IOPs record and the maximum raw IOPS in the setup that was used, StarWind reached out to us, Mellanox, offering to collaborate on building a high-performance cluster that could go beyond any IOPS record ever reported. StarWind has been developing unique core competencies to maximize the performance out of Windows Server, using their StarWind iSCSI Accelerator – a driver that allowed for better workload distribution in Hyper-V environments. By utilizing their accelerator, all CPU cores can be effectively utilized, leaving no cores idle while not overwhelming others. While Mellanox, known for blazing-fast networking, would provide the adapters, switching, and cables for the interconnects and networking. We too know something about performance having pioneered Remote Direct Memory Access (RDMA) and NVMe over Fabrics (NVMe-oF), which are game-changers when it comes to storage performance. Our networking gear provides exceptionally high data transfer rates at the lowest possible latency, with offloads and accelerators to free CPU cycles for storage computational tasks.
For this benchmark are goal was to set a new performance record. StarWind provided the shared storage capacity, married with the best of breed hardware vendors including servers from Supermicro, NVMe Flash drives from Intel, and Microsoft Hyper-V for the hypervisor. The collaboration was an ambitious project as the goal was to provide a high-performance highly-available (HA) cluster capable of delivering raw IOPS with latency comparable to local NVMe device of 10ms or less.
The StarWind team built a similar cluster to that used by Microsoft earlier to set their world record. This time packing it with NVMe drives and benchmarking performance in three scenarios. While the testbed setups were slightly different, they all basically shared most of the hardware and interconnection scheme but used very distinct software components and settings.
For the very first test, iSCSI / iSER cache-less all-flash hyper-converged cluster was built as a traditional 2-node StarWind HyperConverged Appliance (HCA) but “on steroids” as it was scaled out to a whopping 12 nodes. Each node utilized a Supermicro SuperServer 2029UZ-TR4+ with dual Intel® Xeon® Platinum 8268 processors, two Intel® Optane™ SSD DC P4800X NVMe Flash drives, and two Mellanox ConnectX-5 100 GbE NICs, all connected by a Mellanox SN2700 Spectrum™ switch and Mellanox LinkX® copper cables. Virtual machines were tested running directly from the NVMe flash drives, pooled by StarWind VSAN and chopped into virtual volumes. This test was meant to determine I/O latency, so no cache was used
During our second run, the same hardware and software setup from above was used, but the cluster was configured to run workloads on comparably slower M.2 SATA flash drives. Double the amount of Intel Optane cards were used exclusively for write-back cache as we were reaching for both maximum IOPS and bandwidth with this test.
For the final test, we utilize the same Hyper-Converged Infrastructure (HCI) cluster, except we scrapped iSCSI / iSER and replaced it with the SPDK NVMe-oF target and the StarWind NVMe-oF initiator, to prove we could obtain even better performance than we had before. To summarize, Intel Optane, NVMe-oF, no cache, the shortest possible I/O path, and Windows. The primary purpose was to demonstrate how efficient the StarWind NVMe-oF initiator is at presenting PCIe flash.
In virtualization and hyper-converged infrastructures, it’s common to judge performance based on the number of input/output (I/O) operations per second, or “IOPs.” Primarily the number of reads or writes that virtual machines can perform. A single VM can generate a considerable number of either random or sequential reads/writes. In real production environments, there usually are loads of VMs, and that makes the data flow fully randomized. 4K block I/O is the block size that Hyper-V virtual machines use and was how we configured our I/O patterns within the benchmark.
The first test utilized the cache-less 12-node HCA cluster and delivered 6.7 million IOPS, 51% out of theoretical 13.2 million IOPS. This breakthrough performance in a simple production configuration (client access using iSCSI without RDMA). The backbone was running over iSER and used no proprietary technology. Similar performance results can be obtained with many hypervisors using pure iSCSI initiators, a StarWind Virtual SAN and Mellanox interconnects.
For the second test, we used a production configuration utilizing StarWind’s Virtual SAN for shared storage write-back cache running over iSER and using iSCSI without RDMA for client access. In our environment, no proprietary technologies were used, meaning that similar performance can be obtained with any hypervisor using iSCSI initiators, StarWind Virtual SAN, and Intel Optane configured as caching device and Mellanox interconnects.
For the final test, the achieved breakthrough performance was obtained by configuring NVMe-oF on the cluster. The fastest NVMe storage is the passthrough to SPDK NVMe target VM, and StarWind NVMe-oF storage initiator interconnection and Mellanox NVMe-oF ready adapters and Switches.
This groundbreaking demo reflects the collaboration of industry-leading companies, featuring production-ready StarWind Hyper-Converged Appliance (HCA), built with the newest SuperMicro motherboards, the most recent Intel CPUs & NVMe flash memory, all over a Mellanox end-to-end RDMA enabled network. This benchmark exposes the true performance capabilities of a Hyper-Converged Infrastructure (HCI). The 12-node all-flash NVMe cluster delivered 26.834 million IOPS, 101.5% performance out of theoretical 26.4 million IOPS. Similar production-ready environments can easily be built today with commodity components that are more than capable of delivering enterprise-class performance without the cost, risk, or complexity. More than enough IOPS to enable today’s enterprises to meet the growing demands of the data explosion. Exciting times we live in, indeed!