All posts by Vishal Shukla

About Vishal Shukla

Vishal Shukla is a Director of Ethernet switch technologies at Mellanox responsible for technical solutions and partners product management for the Company’s Ethernet portfolio. Prior to joining Mellanox, Vishal has held various executive and leadership roles at IBM, BNT, Cisco systems, Nortel Networks and Infosys Technologies. He holds 25+ patents (15+granted) in the field of software-defined networking, IOT, cloud automation, cloud security, cloud orchestration and in cloud performance areas. He has authored and published several books on SDN, OpenFlow and OpenStack technologies. Vishal holds a MBA from Duke University, Fuqua School of Business, US and a B.S. degree from U.P. Technical University, India.

How to make advanced networks for Hybrid/Multi Cloud?

As cloud use cases and the public cloud matures, hybrid cloud and multi-cloud adoption is growing significantly. Hybrid cloud is the preferred enterprise strategy, according to RightScale’s 2017 State of the Cloud Report. The trend clearly shows that more and more vendors are looking to deploy less critical workloads in the cloud and run critical database (or even apps) on premise data centers. The concept behind this trend is known as edge computing (sometimes also called fog computing) where most of the local and critical processing is done on the edge instead of sending all the data to the cloud. The trend of edge computing and the hybrid cloud is clearly identified by public cloud providers such as Azure and the on premise Azure stack — both evidence of a growing trend.

Almost all of the Enterprise using the cloud believes in a multi cloud approach; making sure that they are not locked in with just one cloud vendor and have either on premise or have multiple cloud vendors. So, hybrid cloud comes with two options:

  • On premise + public cloud combination
  • Public cloud 1 + public cloud 2 combination

In both cases, networking for the hybrid cloud is key.

Cloud Ready Networks

In past few years networking also evolved to support cloud use cases. BYoIP, multitenancy, agile workloads, devops, massive data growth, Machine Learning and advanced visibility requirements have helped networks to evolve.

  • New technologies such as VXLAN, Unnumbered BGP, EVPN, Segment Routing and advanced visibility have evolved networking technologies to fit with cloud needs of scale, agility, programmability and flexibility.
  • Open Networking has helped the tier-1 and tier-2 cloud vendors and cost savvy, “as a service” cloud providers to grow their data center exponentially and still keep costs down.
  • Open source ecosystem such as OpenStack has helped to accelerate innovation and at same time bringing all components of cloud together without any vendor lock.

What about Hybrid Cloud Networks?

Hybrid cloud networks require special networking because the networks must connect the workloads sitting in different environments which, in turn, in different domains and likely running different protocols. For hybrid cloud networks, Data Center Interconnect (DCI) is another term used. In the past, many technologies have been available for DCI. QinQ has been a known technology wherein a VLAN can be encapsulated in other VLAN – essentially preserving the service w.r.t customer. Other than QinQ, there had been technologies such as EoMPLS, VPLS and OTV. All of these technologies were great for solving the challenges associate with older data centers.

New data centers which are designed for the latest cloud properties (multi tenancy, high speed, application level segregation, etc.) require more seasoned DCI technology. A protocol that has the ability to not only identify a customer network (in multi-tenant environment) but also the service running inside that customer network (which is another layer of segmentation inside customer network).

QinVNI for Hybrid Cloud Networks

 

In the past, QinQ was used to stretch customer VLANs between data centers. With VXLAN becoming the prominent way of connecting the cloud over L3 fabrics, the QinQ protocol has evolved into QinVNI. The concept remains same in terms of preserving the service and customer tag and map the right customer and service inside a multi-tenant environment. The following figure explains how this feature works.

 

In above example, single translation is happening at the edges and still the internal service tag is preserved and delivered right to the cloud. The technology is scalable to number of VXLAN supported in the edge switches.

With the rise of hybrid cloud, it is only a matter of time before you will need to connect to cloud. Technologies such as VPN gateway, direct connect, etc., are available from cloud providers. However, how flexibly you want to connect to cloud using those technology is up to you to design. With the granularity coming to workload level, it is high time that the networks are defined by service level using technologies like QinVNI.

Why QinVNI for Hybrid Cloud Networks

QinVNI is a new hybrid cloud network / Data center interconnect technology, which offers the best of VXLAN and QinQ in single protocol. Hybrid cloud use cases are expanding be it storage – Storage on premise /DR-backup in cloud; Enterprise – DB on premise/compute in cloud; or be it multi cloud scenarios. With the growing number of hybrid cloud use cases, the networking for such scenarios will become crucial. QinVNI provides multi-tenant Hybrid cloud networking by preserving VLANs inside VXLANs.

QinVNI with Mellanox

For QinVNI to work properly at scale, the switch at the edge should have scalable VXLAN implementation. Below is an example of a POC which was setup for a well-known cloud provider. The POC demonstrates multi-tenant environment, wherein the VLANs can be preserved inside VXLAN headers and can be delivered at on premise data center. Same VLANs has been used for different tenants.

The following section gives details on how one can configure Hybrid cloud networking on Spectrum-based Mellanox platforms.

 

 

 

The topology above shows the POC, which has following components

 

  1. Two Data centers (Data Center 1 and 2). Data center – 1 is customer’s public cloud (for customer A and B) and Data center – 2 is on premise for the same customers A and B.
  2. Each data center has two servers (each for a tenant), and each server has three VMs (each in one VLAN).
  3. Each tenant has same VLANs.
  4. Each customer is assigned to different VNIs.
  5. In this example we have VTEPs – one side on the Data center – 2 servers, and other side on the Edge TOR of Data center – 1.
  6. For simplicity, the configurations used static VXLAN tunnels between VTEP in compute and VTEP on TOR. EVPN can be used to advertise VTEP and MAC/IP, instead of statically configuring VTEP and learning MAC/IP in data plane.

 

This blog does not cover the underlay configuration, and assume that there is L3 connectivity on underlay.

Following are the configurations on the VTEP on TOR (connected to Data center – 1).

Following are on premise server configurations for configuring static VTEP (on server – 2):

 

Conclusion

QinVNI is latest and most mature technology for DCI and Hybrid Networks designed for new generation Leaf/Spine L3 fabrics. Mellanox customers have designed, tested and deployed multiple hybrid cloud networks running at 100GbE with best in class ASIC and without compromising on scale or performance. Contact us today on how we can improve your data center interconnect or Hybrid cloud networking challenges.

Supporting Resources

Is it Time to Upgrade to VXLAN?

VXLAN has been a hot protocol for the past few years, and network architects have been looking at this protocol to see if it solves the use cases which have arisen due to an explosion of data, evolution of agile workloads, scalability requirements, OpenStack adaptation, and for many other reasons. VXLAN has shown the promise to provide:

  • Scale: Delivering a scaled version of L2 network (by 16M VNI) on highly scalable and proven L3 networks (by making L3 underlay)
  • Agility: Providing VM ready networking infrastructure – by proving VM mobility, multi tenancy, security etc.
  • Cloud networking: Offering multiple solutions for Private, Public, & Hybrid cloud networks
  • Programmability: Bringing more programmability and flexibility to work with network controllers and cloud orchestration stack such as OpenStack.

In past few years, even with all these promises, there was not much adoption of VXLAN in data centers, due to the following reasons:

  • Controller-based solutions. It doesn’t scale for cloud use cases. Few hundred nodes are fine, but what about thousands of node?
  • Not ‘open’ enough. It lacks the DNA which is sought after by the new generation of network architects – it was not “open” enough to work with open eco system. Some of the question I have heard from customers in past; Does it work with only ESX? Can I make it work with OpenStack without any controller? Can I use REST interface to configure switch directly? Can I use Ansible to configure? Can I integrate the control plane updates on Slack on SRE channel?
  • Proprietary & costly. Controller-based VXLAN orchestration systems are proprietary and costly (thousands of dollars per socket) and not flexible enough to be used with open ecosystems.
  • Multicast based control plane. Some are using their own custom control plane, and while it happens customers must wait for the industry to converge on a standard.
  • No end to end VXLAN story. There are non-virtualized servers, VMs and containers. (Yeah – you can find the hardware VTEP story with controllers – but really how good that work?)

While VXLAN has proven its viability in the past few years, a gap was always visible. There is no standard which not only has all properties of VXLAN but also has the support of multiple vendors. A VXLAN that can eventually take care of most of the use cases and yet support open networking and network disaggregation concepts which most data center players care about now.

So what changed now?

The answer is simple: VXLAN got a new control plane – EVPN.

With EVPN the following things have changed significantly in the world of VXLAN:

  • BGP as control plane. BGP control plane is used for VTEP discovery to learn MAC and IP routes from other VTEPs. The exchange of this information takes place using EVPN NLRIs. Having BGP as the control plane, shows that VXLAN now has trusted foundation and it is here to stay long with all goodness with BGP provides.
  • Converging vendors. The BGP-based open standard/RFC has the promise to converge the industry on standard. Not only switch software vendors like Cumulus, Cisco, Juniper, Arista and Mellanox have already converged on EVPN, but even controller vendors like Nuage and Contrail has converged to this standard.
  • Integrated networks. EVPN has the capability to make a true integrated network which can contain non-virtualized bare metal servers, VMs and containers running on hypervisor – all converging with EVPN.
  • OpenStack support. OpenStack is converging with EVPN with their bag pipe BGP.
  • VXLAN routing makes it possible to route between subnets w/o a controller or a DVR (distributed virtual router), with Anycast gateway support along with VRF it supports most of the requirements needed by virtual workloads.
  • DCI (Data Center Interconnect) solutions. VXLAN based DCI, which is based on QinVXLAN, provides same functionality at scale which other DCI provides. It is more prominent now as most enterprises have their workloads running in AWS or Azure.

Evaluating VXLAN switches?

What to look for in a VXLAN enabled switches:

  • Look for right ASIC. To start with look for a switch which has a proven control and data plane. Typically look for a Tolly report to get all following details. You can see Mellanox Tolly report here and analyze following:
    • Packets per second. A 32×100 GbE switch will have 4.47 billion packet per second (with all sort of packet size), make sure your vendor has that covered.
    • Latency – If this matters to you, 300 Nano seconds latency at all packet sizes is pretty easily available if you are looking for it.
    • Micro burst absorption ability.
    • Fairness on how the buffers are shared between ports.
  • Look for right scale
    • How many L3 routes (Can the switch support more than 128K routes?)
    • VXLAN VNI scale (Can the switch support more than you need?)
    • VTEP scale (Can the switch support more than 128 VTEPs?)
  • Open Networking. One of the reasons for looking at EVPN is because it is based on open standards and has the ability to work with open eco system. Look for a switch which supports open networking and can give you disaggregated options if needed. In short, look for a vendor that can support multiple OS options.
  • Container support. Look for Container-ready switch. You need a switch that not not only has the ability to run containers on switch for “of the shelf features” but also has capacity to track and monitor your containers.
  • Telemetry matters. Look for a switch which can give you the streaming insight on what’s happening inside networks in form of histograms, not just give you point data.
  • Care about Machine Learning? Running a machine learning or AI related work load? Then look for RoCE proven switch & solution.
  • DEVOPS readiness. Look for a switch that can support the DEVOPS and agile infrastructure management approach, because that is the future!

Supporting Resources: