Unless you’ve been stranded on a deserted island, you’ve probably noticed that leaf/spine networks have started taking over data centers in the last few years. It’s no secret that people prefer scale-out over scale-up solutions, and for networking, the old scale-up approach was to use massive monolithic modular switches, a.k.a. Big Frickin’ Switches (BFS), whenever high port counts were needed. These BFS switches have gone the way of mainframes and Pokemon-go; some people still play with them, but all the cool kids have moved on to leaf/spine networks. The market data clearly shows this trend: years ago the majority of switches were modular, but now around 75% of all data center switch ports belong to fixed-port switches:
Now, you shouldn’t look down on old companies still buying the BFS approach from Cisco/Arista any more than you should judge an elderly person too harshly for smoking cigarettes. They didn’t know any better when they started and now they are too old to change. But whenever you see a young person smoking, I know you can’t help thinking “really?” I get the same feeling whenever I see networks designed with big modular switches. I feel like asking “haven’t you seen the warning labels” (i.e., the price tags, and power consumption figures)?
Now, I’m not saying BFS are evil, but they *are* deployed like Sith Lords from Star Wars “always two there are, no more, no less”, while scale-out leaf/spine architectures spread traffic across many small fixed port switches:
Modern data centers use fixed port switches in leaf/spine topologies for all the right reasons:
Leaf/Spine networks scale very simply, just adding switches incrementally as growth is needed, but they do have some natural sweet spots. For example, since a 32 port spine switch can connect to 32 leaf switches, a natural pod size might be 32 racks of servers with 2 tiers of switching, serving around 1500 10/25GbE servers:
If you need a larger network, you would deploy these leaf/spine switches in “Pods” that would represent a logical building block that you could easily replicate as needed. In each Pod, you would reserve half of the spine ports for connecting to a super-spine which would allow for non-blocking connectivity between Pods. A best practice in leaf/spine topologies with a lot of east/west traffic is to keep everything non-blocking above the leaf switches. This would make a nature pod size of up to 16 server racks or 768 Servers per pod, and you could easily have up to 32 pods for around 24,000 servers:
There is no one-size-fits-all solution, so these are just examples to show what is possible, but whether you have 300 nodes or 30,000 nodes, chances are a leaf/spine network will work better for you than the old scale up model.
Now I know the resistance to adopting new approaches and some of you are looking at a group of super-spine switches and are thinking that a couple BFS switches would be easier than all those little spine switches. The thing to remember is that they take up around the same amount of rack space, and consume more power and…
Guess which one is going to perform best?
Guess which one is going to cost less?
I’ll give you a hint: Just turn it around and ask if you should just buy 16 small fixed port switches or if you should you buy 2 modular chassis + 16 line cards + 4 supervisor modules + 8 fabric cards?
You can see where Facebook is using 1U fixed port switches where their largest modular chassis aren’t big enough: Facebook Fabric Aggregator
In my next blog, I will write about how to make your Leaf and Spine network hum: