Covered in previous blog posts (Part 1 and Part 2), the concept of the Virtual Modular Switch (VMS) is clearly an advantage for networks of medium to large scale. As we move into huge networks where multiple modular switches are needed, this advantage reduces to the point where it is a matter of personal preference whether to implement using VMS or multiple chassis.
When the odds are even, this preference can come down to a matter of cost of equipment, cost of operating the equipment, certain network KPIs that need to be met or any other parameter that the network facilitator will care about.
The Mellanox implementation of VMS is based on our own ASIC design known as SwitchX. It is used as the fabric element in each of our Ethernet (and InfiniBand) product line of switches. SwitchX carries 36 high speed interfaces of standard 40 GbE which when used in a non-blocking fat tree topology, allows 18 ports to be used for external interfaces and 18 ports to be used as internal interfaces towards the spine layer of the VMS fat tree. Having 36 ports on each of the spine elements allows as many as 36 leaf elements. The total number of external ports in a non-blocking two tier VMS is 36*18=648.
As noted above, the SwitchX ASIC is also used as the fabric of FDR switches. These switches are capable of running at speeds of 56Gb/s. This means that the standard 40GbE links can actually run at a non-standard speed of 56GbE. This will obviously work between Mellanox devices only, but since the VMS is designed as a whole solution, it is safe to assume that the spine and leaf devices come from the same vendor.
In the Mellanox case, this translates to internal links capable of speeds higher by 40% from the external ports. In order to maintain a non-blocking scheme of the VMS, the leaf devices can utilize 20 (actually its 21) interfaces as external ports and 16 interfaces as internal ports. This raises the number of ports on the two-tier VMS solution from 648 ports to 720 ports (actually 756) using the same equipment. The drawing below shows such a configuration using 56GbE as internal links for building a VMS of 360 ports.
Another key KPI of the aggregation solution is the latency it is carrying. The SwitchX ASIC routes L3 packets of any size at a flat cut-through latency of 330ns. In the VMS solution, traffic running between two different leaf switches will always run a path of leaf-spine-leaf and latency will be a predictable ~1us regardless from network load or traffic distribution pattern.
An exception to the above is the case of using 56GbE for the uplinks. Since traffic is flowing into the VMS at rate of 40Gb/s and out of the leaf at a higher rate, this requires the switch to run at (partial) store & forward mode. This affects the latency performance of the VMS taking it from the 1us stated above to ~2us in the case where the leaf devices run at store and forward mode.
To conclude, chassis based solutions seem like a legacy approach which doesn’t really meet the requirements of the modern data center network. A distributed solution is at least matching the chassis based solution in very large networks and exceeds it as the scale of the network reduces. Choosing the specific switch from which to build the strongest VMS is now just a matter of the strongest building block.
Author: Since 2011, Ran Almog has served as Sr. Product Manager for Ethernet Products. Prior to joining Mellanox, Ran worked at Nokia Siemens Networks as a solution sales and marketing specialist for the packet networks business unit. Ran holds a BSc. In Electrical Engineering and Computer Sciences from the University of Tel Aviv, Israel.