VXLAN has been a hot protocol for the past few years, and network architects have been looking at this protocol to see if it solves the use cases which have arisen due to an explosion of data, evolution of agile workloads, scalability requirements, OpenStack adaptation, and for many other reasons. VXLAN has shown the promise to provide:
Scale: Delivering a scaled version of L2 network (by 16M VNI) on highly scalable and proven L3 networks (by making L3 underlay)
Agility: Providing VM ready networking infrastructure – by proving VM mobility, multi tenancy, security etc.
Programmability: Bringing more programmability and flexibility to work with network controllers and cloud orchestration stack such as OpenStack.
In past few years, even with all these promises, there was not much adoption of VXLAN in data centers, due to the following reasons:
Controller-based solutions. It doesn’t scale for cloud use cases. Few hundred nodes are fine, but what about thousands of node?
Not ‘open’ enough. It lacks the DNA which is sought after by the new generation of network architects – it was not “open” enough to work with open eco system. Some of the question I have heard from customers in past; Does it work with only ESX? Can I make it work with OpenStack without any controller? Can I use REST interface to configure switch directly? Can I use Ansible to configure? Can I integrate the control plane updates on Slack on SRE channel?
Proprietary & costly. Controller-based VXLAN orchestration systems are proprietary and costly (thousands of dollars per socket) and not flexible enough to be used with open ecosystems.
Multicast based control plane. Some are using their own custom control plane, and while it happens customers must wait for the industry to converge on a standard.
No end to end VXLAN story. There are non-virtualized servers, VMs and containers. (Yeah – you can find the hardware VTEP story with controllers – but really how good that work?)
While VXLAN has proven its viability in the past few years, a gap was always visible. There is no standard which not only has all properties of VXLAN but also has the support of multiple vendors. A VXLAN that can eventually take care of most of the use cases and yet support open networking and network disaggregation concepts which most data center players care about now.
So what changed now?
The answer is simple: VXLAN got a new control plane – EVPN.
With EVPN the following things have changed significantly in the world of VXLAN:
BGP as control plane. BGP control plane is used for VTEP discovery to learn MAC and IP routes from other VTEPs. The exchange of this information takes place using EVPN NLRIs. Having BGP as the control plane, shows that VXLAN now has trusted foundation and it is here to stay long with all goodness with BGP provides.
Converging vendors. The BGP-based open standard/RFC has the promise to converge the industry on standard. Not only switch software vendors like Cumulus, Cisco, Juniper, Arista and Mellanox have already converged on EVPN, but even controller vendors like Nuage and Contrail has converged to this standard.
Integrated networks. EVPN has the capability to make a true integrated network which can contain non-virtualized bare metal servers, VMs and containers running on hypervisor – all converging with EVPN.
OpenStack support. OpenStack is converging with EVPN with their bag pipe BGP.
VXLAN routing makes it possible to route between subnets w/o a controller or a DVR (distributed virtual router), with Anycast gateway support along with VRF it supports most of the requirements needed by virtual workloads.
DCI (Data Center Interconnect) solutions. VXLAN based DCI, which is based on QinVXLAN, provides same functionality at scale which other DCI provides. It is more prominent now as most enterprises have their workloads running in AWS or Azure.
Evaluating VXLAN switches?
What to look for in a VXLAN enabled switches:
Look for right ASIC. To start with look for a switch which has a proven control and data plane. Typically look for a Tolly report to get all following details. You can see Mellanox Tolly report here and analyze following:
Packets per second. A 32×100 GbE switch will have 4.47 billion packet per second (with all sort of packet size), make sure your vendor has that covered.
Latency – If this matters to you, 300 Nano seconds latency at all packet sizes is pretty easily available if you are looking for it.
Micro burst absorption ability.
Fairness on how the buffers are shared between ports.
Look for right scale
How many L3 routes (Can the switch support more than 128K routes?)
VXLAN VNI scale (Can the switch support more than you need?)
VTEP scale (Can the switch support more than 128 VTEPs?)
Open Networking. One of the reasons for looking at EVPN is because it is based on open standards and has the ability to work with open eco system. Look for a switch which supports open networking and can give you disaggregated options if needed. In short, look for a vendor that can support multiple OS options.
Container support.Look for Container-ready switch. You need a switch that not not only has the ability to run containers on switch for “of the shelf features” but also has capacity to track and monitor your containers.
Telemetry matters. Look for a switch which can give you the streaming insight on what’s happening inside networks in form of histograms, not just give you point data.