The high bandwidth requirements of modern data centers are driven by the demands of business applications, data explosion, and the much faster storage devices available today. For example, to utilize a 100GbE link, you needed 250 hard drives in the past, while today, you need only three NVMe SSDs.
After investing in the most advanced network infrastructure, the highest bandwidth links, the shiniest SDN controller, and the latest Cloud automation tools, your expectation is to fully utilize each link, whether 100GbE, 25GbE, or legacy 10GbE, in order to reach the highest IOPs measurements with your Software Defined Storage solution. But, is a collection of the cutting edge technologies enough?
Moving to a scale-out paradigm is a common practice today. Especially with hyper-converged solutions, data traffic is continuously running east-west between storage and compute server nodes. Even with 10GbE interfaces on individual servers, the aggregated data flow can fully utilize 100GbE links between leaf and spine layer switches. In addition, software defined storage generates extra traffic to maintain the solution, providing yet another source to consume network bandwidth.
To get the most from your network equipment, one needs to look at it from a PCIe to PCIe perspective, define the specific use case, and run a few simulations. Let us consider a simple example of an OpenStack deployment:
Now, where are those pesky bottlenecks?
VXLAN UDP packets are hitting the NIC card on the server, and the NIC has no idea what to do with this creature, so it pushes it up to the kernel. Once the software is involved in the data plane, it is game over for high performance. The only way to sustain 10Gbps is for the NIC to know how to get inside the UDP packet and parse it for checksum, RSS, TSS and other operations that are natively handled with simple VLAN. If the NIC cannot do that, then the CPU will need to, and that will come at the expense of your application.
So, till now we were able to achieve higher CPU and lower performance, but what about the switch?
Can my switch sustain the 100GbE between the ToR and Spine? Losing packets means re-transmissions, how can you be sure that your switch has zero packet loss?
Ceph is now pushing 50GbE to the compute nodes 10GbE interfaces; congestion occurs and you cannot design it in a way that the congestion points will be predictable since the computes are dispersed. So, the question remains, can you ensure the switch will be able to handle this kind of congestion fairly?
There is a need for VXLAN Termination End Point (VTEP) to connect bare-metal servers to the virtual networks. This should be handled by the switch hardware. Another VTEP can be done on the Hypervisor, but then the OVS becomes the bottleneck. So, what if we offloaded it to the NIC?
I can continue on and on about the TCP/IP flow that involves the CPU in the network operations, but now let’s talk about deployment 100GbE infrastructure and getting 100GbE SDN deployment via:
And now, what about 25GbE from the server to the Top of Rack switch? You have the infrastructure, make sure that your cables are SFP28 capable, the form factor is the same and you are all set for the next step. You are now ready for 25G.