As a network engineer, why should you care about what the developers are doing with Kubernetes? Isn’t it just another application consuming network resources?
Kubernetes is quickly becoming the new standard for deploying and managing containers in the Hybrid-Cloud. Using the same orchestration on-premise and on the public cloud allows a high level of agility and ease of operations (Use the SAME API across bare metal and public clouds). Kubernetes (K8s) is an open-source container-orchestration system for automating deployment, scaling and management of containerized applications. It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation.
A node is the smallest unit of computing element in Kubernetes. It is a representation of a single machine in a cluster. In most production systems, a node will likely be either a physical server or a virtual machine hosted on-premise or on the cloud.
When deploying applications onto the cluster, it intelligently handles distributing work to the individual nodes. If any nodes are added or removed, the cluster will shift around workloads as necessary. It should not matter to the application, or the developer, which individual nodes are actually running the code.
Since applications running on the cluster are not guaranteed to run on a specific node, data cannot be saved to any arbitrary place in the file system. If an application tries to save data for later usage but is then relocated onto a new node, the data will no longer be where the application expects it to be. For this reason, the traditional local storage associated with each node is treated as a temporary cache to hold applications, but any data saved locally cannot be expected to persist.
To store data permanently, Kubernetes use Persistent Volumes. While the CPU and RAM resources of all nodes are effectively pooled and managed by the cluster, persistent file storage is not. Instead, local or cloud stores can be attached to the cluster as a Persistent Volume.
Applications running on Kubernetes are packaged as Linux containers. Containers are a widely accepted standard, so there are already many pre-built images that can be deployed on Kubernetes.
Containerization allows the creation of self-contained Linux execution environments. Any application and all its dependencies can be bundled up into a single file. Containers allow powerful CI (continuous integration) and CD (continuous deployment) pipelines to be formed as each container can hold a specific part of an application. Containers are the underlying infrastructure for Microservices.
Microservices are a software development technique, an architectural style that structures an application as a collection of loosely coupled services. The benefit of decomposing an application into different smaller services is that it improves modularity. This makes the application easier to understand, develop, test, and deploy.
Kubernetes doesn’t run containers directly. Instead, it wraps one or more containers into a higher-level structure called a pod. Any containers in the same pod will share the same Node and local network. Containers can easily communicate with other containers in the same pod as though they were on the same machine while maintaining a degree of isolation from others.
Pods are used as the unit of replication in Kubernetes. If your application becomes too heavy and a single pod instance can’t carry the load, Kubernetes can be configured to deploy new replicas of your pod to the cluster as necessary. Even when not under heavy load, it is standard to have multiple copies of a pod running at any time in a production system to allow load balancing and failure resistance.
Although pods are the basic unit of computation in Kubernetes, they are not typically directly launched on a cluster. Instead, pods are usually managed by one more layer of abstraction, “deployment”. A deployment’s purpose is to declare how many replicas of a pod should be running at a time. When a deployment is added to the cluster, it will automatically spin up the requested number of pods, and then monitor them. If a pod dies, the deployment will automatically re-create it. Using a deployment, you don’t have to deal with pods manually. You can just declare the desired state of the system, and it will be managed for you automatically.
Example of a web application deployment over Kubernetes
Service/Micro-service: A Kubernetes Service is an abstraction which defines a logical set of pods and a policy by which to access them. Services enable loose coupling between dependent pods.
The term service mesh is used to describe the network of microservices that make up such applications and the interactions between them.
The term service mesh is used to describe the network of microservices that make up such applications and the interactions between them. As a service mesh grows in size and complexity, it can become harder to understand and manage. Its requirements can include discovery, load balancing, failure recovery, metrics, and monitoring. A service mesh also often has more complex operational requirements, like A/B testing, canary releases, rate limiting, access control, and end-to-end authentication.
One of the most popular plugins to control a service mesh is Istio, an open source, independent service, that provides the fundamentals you need to successfully run a distributed microservice architecture.
Istio provides behavioral insights and operational control over the service mesh as a whole, offering a complete solution to satisfy the diverse requirements of microservice applications.
With Istio, all instances of an application have their own sidecar container. This sidecar acts as a service proxy to all outgoing and incoming network traffic.
At its core, Kubernetes Networking has one important fundamental design philosophy:
Every Pod has a unique IP.
The Pod IP is shared by all the containers inside, and it’s routable from all the other Pods. A huge benefit of this IP-per-pod model is there are no IP or port collisions with the underlying host. There is no need to worry about what port the applications use.
With this in place, the only requirement Kubernetes has is that Pod IPs are routable/accessible from all the other pods, regardless of what node they’re on.
In the Kubernetes networking model, in order to reduce complexity and make app porting seamless, a few rules are enforced as fundamental requirements:
There is a vast amount of network implementations for Kubernetes. Among all these implementations Flannel and Calico are probably the most popular ones that are used as network plugins for the Container Network Interface (CNI). CNI, can be seen as the simplest possible interface between container runtimes and network implementations, with the goal of creating a generic plugin-based networking solution for containers.
Flannel can run using several encapsulation backends with VXLAN being the recommended one.
L2 connectivity is required between the Kubernetes nodes when using Flannel with VXLAN.
Due to this requirement the size of the fabric might be limited, if a pure L2 network is deployed, the number of Racks connected is limited to the number of ports on the Spine switches.
To overcome this issue, it is possible to deploy an L3 Fabric with VXLAN/EVPN on the leaf level. L2 connectivity will be provided to the nodes on top of a BGP routed fabric that can scale easily. VXLAN packets coming from the Nodes will be encapsulated into VXLAN Tunnels running between the leaf switches.
The Mellanox Spectrum ASIC provides huge value when it comes to VXLAN throughput, latency and scale. While most switches can support up to 128 remote VTEPs, meaning up to 128 racks in a single fabric. The Mellanox Spectrum ASIC supports up to 750 remote VTEPs allowing up to 750 Racks in a single fabric.
Calico is not really an overlay network but can be seen as a pure IP networking fabric (leveraging BGP) in Kubernetes clusters across the cloud.
A typical Calico deployment looks as followed:
In a Calico network, each endpoint is a route. Hardware networking platforms are constrained by the number of routes they can learn. This is usually in the range of 10,000’s or 100,000’s of routes. Route aggregation can help, but that is usually dependent on the capabilities of the scheduler used by the orchestration software (e.g. OpenStack).
When choosing a Switch for your Kubernetes deployment make sure it has a routing table size which will allow a scale that will not limit your Kubernetes compute scale.
The Mellanox Spectrum ASIC provides a fully flexible table size which enables up to 176,000 IP route entries with Spectrum1 and up to 512,000 with Spectrum2, enabling the largest Kubernetes clusters which run by the biggest enterprises world-wide.
When working with Cumulus Linux OS on the switch layer, you would probably want to use FRR as the routing stack on your nodes, leveraging BGP unnumbered.
If you are looking for a pure open sourced solution you should check out the Mellanox LinuxSwitch, which supports both FRR and BIRD as the routing stack.
Containers are automatically spun up and destroyed as needed on any server in the cluster. Since the containers are located inside a host, they can be invisible to network engineers — never knowing where they are located or when they are created and destroyed.
Operating modern agile data centers is notoriously difficult with limited network visibility and changing traffic patterns.
By using Cumulus NetQ on top of Mellanox Spectrum switches running the Cumulus OS, network engineers can get wide visibility into Kubernetes deployments and operate in these fast-changing dynamic environments.