No one would argue whether good vision was important if you were a surgeon, a welder, or an Uber driver. In technology, whether you’re a Cloud Architect or in Network Operations, you really need good visibility into what is going on inside your data center. To sleep soundly at night, you have got to actively monitor the performance of your Network, your application performance, and be on the lookout for security breaches. There are analyzers that specialize in each of these three distinct monitoring disciplines: Network Performance, Application Performance, and Security.
You need to “tap your own lines” by placing TAPs at key points in your network. These TAPs will copy all the data traversing the links they are attached to. Then, you need to aggregate those TAPs, consolidating all the flows into a few high bandwidth links on the analyzers. The modern, scaled-out, approach for consolidating TAPs is to use a Software Defined TAP Aggregation Fabric, which amounts to a bunch of Ethernet switches that are only specialized in that they don’t run normal Layer2/3 protocols. Instead,are steering specific flows to specific analyzers.
You might want the TAP Aggregation fabric to do more than just steer the right flows to the right analyzers. You may want your TAP Aggregators to some of the following:
There is no universal consensus on where to place your TAPs, but there are some very common models:
Financial Services organizations frequently TAP every Tier of their network, so they can measure the latency as packets traverse the network while they also implement security monitoring:
Many Cloud Providers TAP every Rack in their data centers for their own monitoring purposes, as well as offering Application Performance reports to their customers:
If you have ever enabled too many debug features on a Cisco/Arista switch, you are rightfully a bit cautious. (Friendly advice: don’t do it unless you also want a switch reboot)
TAP Aggregation switches are the ideal place to implement heavy duty Telemetry features because they cannot impact your production network.
One technique for determining which flows need to be analyzed is to start monitoring your traffic with sFlow. sFlow can give you a picture of the busiest flows, top talkers, top protocols, most flows, and various traffic anomalies. It can help you detect and diagnose network problems. It can also provide a glimpse into which applications are using the network most.
You can also see when something changes and can point out what flows should be sent on for further analysis.
Some of the best monitoring, analytics, and graphing tools are Open Source. Recently, folks have been well served by sending their sFlow data to sFow-RT for analysis and then monitor the state of their datacenter with Grafana: