Addressing Latency with Application Acceleration
Part 3 of the 3-part series
Key decisions facing trading firms and exchanges is the selection of the optimum networking interconnects to achieve the lowest latency. In the first part of this series, I addressed how software improvements developed at the Mellanox Hackathon, and the capabilities of the ConnectX® family of adapters help achieve latencies as low as a 600 nanosecond-well below the current industry standard. While part two discussed the ultra-low latencies of the Mellanox Spectrum switches-capable of less than 300 nanoseconds of latency. Hopefully, some of you were able to see the live demo by attending the co-sponsored roadshow by Mellanox, Cumulus Linux and CDW, which is just wrapping up. If you missed it, be sure to register for the upcoming webinar Powering the Future of Low Latency Trading Platforms with Intelligent Interconnect on May 24th, 2017.
In this final segment, I’ll wrap up the series with discussions on additional features that should be considered when designing a low-latency solution. I’ll jump into the conversation by mentioning speed/bandwidth, which can have a material effect on latencies. Considering that a car traveling at 60 miles per hour will reach the same destination faster than a car traveling at half the speed. We can apply the same principle to date traversing a network. By simply upgrading from 1 or 10G to 25G or higher, will allows the data to reach its destination faster, cutting into and lowering latencies. This is the number one reason 10GE server connections and networks are being replaced with 25, 50 and 100GE solutions. By upgrading solutions and making a speed jump from 10 to 25GE or greater, organization will see an exponential increase in bandwidth and message rates. 25GbE, provides 2.5x more bandwidth over 10GbE and only a cost increase of about 1.5x. This ensures that the network and connections to servers are not creating a bottleneck that might impede optimum trade execution. Low latency networks provide for better execution rates and allow for more in-line processing for electronic trading and other latency sensitive applications.
Mellanox VMA Message Accelerating Software
Another way an organization will see an immediate drop in latency is through the use of messaging acceleration software, which can enhance the performance of applications by decreasing overall latency and minimize server CPU workloads. This is one of the simplest and most effective ways to increase performance and the result can reduce latency by as much as 300 percent while increasing application throughput by as much as 200 percent (when compared to applications running on standard Ethernet). Mellanox messaging accelerator software, VMA, removes the operating system kernel from the message path, and by doing so, drops latencies. This is accomplished by decreasing the amount of work the CPU needs to do, allowing the system to achieve higher message rates, while decreasing average latency.
VMA is a high performance network offload that is designed to accelerate unicast/multicast traffic and throughput demanding applications. VMA runs on Linux and supports TCP/UDP/IP network protocols and requires no modifications to the applications before implementing. Network processing is offloaded from the server’s CPU by passing the traffic directly from the user-space application to the network adapter, bypassing the kernel and IP stack and thus minimizing the time consuming process of context switching, buffer copies and interrupts. By cutting out all these additional processes, VMA is able to produce extremely low-latencies, and improve overall networking performance.
IP Networks for Broadcast Media
VMA can improve performance of any application that relies heavily on the use of multicast, streaming or requires a high packet-per-second rate, low latency or increased application scalability. This includes a wide array of applications in addition to high frequency trading environments. For instance, it plays well with Next-Gen IP-Studios that are now handling real time, high resolution video where a single 4K uncompressed stream can range from 8.3Gb/s to 20Gb/s. Broadcast, Media and Entertainment is in the process of adopting 4 and 8K video streams. As this transition occurs, an existing 10GE interface can easily be saturated. Facilitating this transition is also the fact that many studios are moving away from expensive and proprietary Serial Digital Interface (SDI) to off-the-shelf commodity Ethernet products capable of greater speeds and lower costs. Replacing the SDI interface requires a solution that is capable of low jitter and efficient memory utilization, and it also helps to offload the CPU to free up system resources. This is a perfect environment for high-performance Ethernet and VMA message accelerator software.
Low-latency Network Techniques
In the continued search for the ultimate in low latency, there are still other possibilities that need to be discussed. Such as minimizing interrupts on the processor and reducing the processors roll in handling network traffic which can affect overall latency. Through the use of offloads such as RDMA, DPDK and Mellanox’s own ASAP2 technologies, Mellanox ConnectX®-5 adapters are able to achieve performance improvements by performing network processing tasks for the CPU and by bypassing the OS kernel entirely to free system resources. This offers an enormous performance improvement that can be gained over traditional network and transport choices. Simply by moving to a technology stack that supports advanced kernel bypass techniques on a fast underlying high-performance network, the result is a significantly higher data transfer rate at the lowest latency. Together, these offloads can dramatically improve application response time and increase scaling, reducing latencies and increasing performance across a variety of market segments.
Data Plane Development Kit (DPDK)
In case you are not familiar with DPDK, it provides a programming framework that optimizes the data path for applications and in doing so, enables them to process data packets faster. DPDK is beneficial for applications that must handle a substantial amount of Ethernet packet processing or high message rates such as financial applications or in the processing of virtualized network functions.
Let’s take a step back to explain how exactly DPDK is able to increase an applications performance. First, we need to understand that when sending and receiving Ethernet packets from user space, as most applications do, there is a performance penalty that must be paid. These performance penalties occur as a normal process of sending data. First, instructions associated with a task are copied from user space memory into kernel space memory where they can be executed. The CPU is used for this procedure as well as processing the Ethernet protocol stack (i.e. the ISO network layers) for each instruction set from the NIC to the Kernel. All this adds precious time and consumes CPU cycles to make it all happen.
DPDK enables a faster path for applications to communicate with the NIC and bypasses user space to kernel space switching and the processing of the Ethernet protocol stack. Instead of taking the usual route through the network layers with context switching, DPDK bypasses the kernel and network processing entirely. The application is then able to complete the data processing faster and by freeing the CPU from packet processing, and system wide efficiency is increased.
Similarly, ASAP2 enables the majority of data plane switching and packet processing to be offloaded to the Mellanox NIC, while maintaining control plane operations of virtual switches through Open vSwitch (OVS). ASAP2 offloads certain packet forwarding tasks from OVS to the Mellanox adapter also, allowing for delivery of the highest and most deterministic packet performance, with minimal CPU overhead.
The final topic I’d like to introduce is Packet Pacing. Packet pacing can be used to optimize data transfers where multiple synchronized streams all send data at the same time. In scenarios like this, often times the sheer quantity of data being sent starts to clash causing overflows on the switch buffers. This effect can be detrimental to content delivery for video streams. Video streaming services must send video to consumers around the world with different access speeds. Without packet pacing there would be network packet queueing delays, dropped packets, and buffer overflows occurring, each of which can cause a bad viewing experience for the end-user.
Packet pacing regulates the transmission rate based on an interval (which is determined by the round-trip-time (RTT)-the duration corresponding to a previously sent and received packet.) that will not overrun the capabilities of the receiving node. Probably the best example of this is Netflix who utilizes the Mellanox ConnectX-4 adapter with Packet Pacing offload enabled to deliver content to over 100K subscribers from a single streaming box. Packet pacing has been shown to improve performance by 2-4x and eliminates the possible degradation of the overall network transfer speeds. To further enhance media streaming applications, Packet Pacing can be combined with VMA and implemented without the need for any change to the application.
In particular, for High Frequency Trading applications, saving a few microseconds in latency can be worth millions of dollars. For the Broadcast Media and Entertainment market, which has been moving toward higher resolution video, there are struggles to deliver the needed bandwidth that can keep pace with 4 and 8K formats. As these industries try to maintain a competitive advantage and keep pace with technology evolution, Mellanox will continue to evolve IP technology to incorporate a variety of performance and low latency advantages into an end-to-end Ethernet solution. Mellanox switches, adapters, cables and transceivers, and application acceleration software deliver the most advanced networking technologies capable of exceeding low latency requirements for the most challenging environments. They feature the most efficient offerings with the highest density, and lowest jitter in the industry. This ensures that you get investment protection with the ability to start at 25 Gb/s and upgrade speeds as needed with little to no change. And with aggressive pricing, Mellanox allows you to resolve your most challenging concerns at the lowest total cost of ownership for applications that mandate ultra-low latency.