Over the last five years, compute and storage technology has achieved substantial performance increases, while at the same time being hampered by PCI Express Gen3 bandwidth limitations (PCIe Gen3). AMD is the first X86 processor company to release support for the PCIe fourth generation bus (PCIe Gen4) with the AMD EPYC™ 7002 Series Processor. This is the second-generation AMD EPYC™ processor, but the first x86 data center processor with PCIe Gen4 support delivering substantial system performance improvements by doubling the bandwidth available to storage, networking, and other peripherals when compared to CPUs that only support PCIe Gen3. AMD EPYC™ 7002 Series Processors also offers more PCIe lanes and support for adding more DRAM capacity, allowing the AMD EPYC™ 7002 Series Processor to provide the industry’s highest PCIe bandwidth and memory capacity.
The new AMD EPYC™ 7002 Series Processor
The new AMD EPYC™ 7002 Series Processor delivers advanced processing capabilities, capable of unleashing giant performance gains for a wide variety of workloads and aimed at addressing new data center challenges. The new AMD EPYC™ 7002 Series Processor offers up to 64 multithreaded cores per chip for a total of 128 processing cores in a single socket, dual processor server. It delivers dual-socket performance and I/O without the dual-socket price tag. AMD is also the first to bring to market an x86 data center processor based on 7nm process technology. With double the core density and optimizations that improve instructions per cycle, the result is 4x the Floating-Point performance of 1st Gen AMD EPYC™. Using 7nm process technology also brings energy efficiency so the 2nd Gen AMD EPYC™ can provide the same performance at half the power consumption. That is amazing!
Alongside its high core count there are an extra pair of memory channels, allowing the AMD EPYC™ 7002 Series Processors to take advantage of up to 4TB of RAM for a single socket and 8TB for a dual socket server with 256GB DIMMs. For companies looking to host multi-tenant workloads, the option of adding more DRAM means more tenants can be added per server, which translates to substantial increase in revenue streams.
Mellanox ConnectX Adapters
Mellanox ConnectX offers 200Gb/s InfiniBand (HDR) and Ethernet connectivity, with sub-600 nanosecond latency and up to 200 million messages per second. Mellanox ConnectX SmartNICs and BlueField I/O Processing Units (IPU) are the world’s first PCIe Gen4 smart adapters. The ConnectX smart adapter solutions are optimized to provide breakthrough performance and scalability with the new AMD EPYC™ 7002 Series processor for the most demanding compute and storage infrastructures. By using more of the faster PCI Express 4.0 lanes, Mellanox ConnectX 100 and 200 gigabit per second adapters can achieve full I/O throughout with direct connectivity to 24 NVMe storage drives in a single system. The combination of Mellanox adapters with PCIe Gen4 support and the 2nd Gen AMD EPYC™ processor are ideal for advanced server and storage solutions, providing high-performance computing, artificial intelligence, cloud and enterprise data centers the high data bandwidth they need for the most compute and storage demanding applications. By leveraging the PCIe Gen4 support in both 2nd Gen AMD EPYC™ processors and ConnectX adapters, mutual customers can maximize their data center return-on-investment.
Kernel Bypass Technology
Network and storage processing are very CPU-intensive operations; however, the CPU doesn’t only have to handle these data movement and processing tasks, it must also perform application workload activities. Mellanox ConnectX adapters utilize offloads and accelerators such as Accelerated Switching and Packet Processing (ASAP²), Remote Direct Memory Access (RDMA), and overlay network encap/decap (e.g. for VXLAN) to relieve the CPU from I/O tasks and enable the industry’s lowest network latency. This allows for more efficient data movement for the network, storage devices and application workloads, resulting in lower application latency and leaving more CPU cycles available to accelerate applications and processes.
Impact on Compute and Storage
The improved PCIe Gen4 bandwidth and added PCIe lane count will directly translate to helping tackle the growing need for more compute processing and storage bandwidth. Most of the bandwidth need is in the PCIe bus as a path to local and networked storage and network links to other servers. The added memory is a bonus for storage solutions where a large memory cache is needed, and the up to 4TB of memory for a single socket is a lot of headroom for future workloads.
Where will we see AMD EPYC™ 7002 Series Processors fitting in initially? There are many use cases but to name a few, first might be single socket Windows Storage Spaces Direct (S2D) solutions. These are typically 1U and 2U platforms that support a multi-node, hyperconverged infrastructure (HCI) deployment. Building them with the 2nd Gen AMD EPYC™ processors will allow more dedicated NVMe PCIe lanes without the need for a PCIe NVMe switch. That means more NVMe SSDs with higher storage throughput and IOPS available for workloads running on these platforms.
In a Hyper-Converged solution, one could set up the system with a higher clock speed CPU versus core depth since most of the common virtual machine workloads each use 2-4 virtual CPUs. By utilizing 16 cores with 1TB of RAM, the AMD EPYC™ 7002 Series Processor would provide a solution that bumps up the core density without the need to add the cost of a dual socket setup.
Again, leading the charge to adopt new technology, cloud computing market is already taking advantage of the massive compute capacity of AMD EPYC™ 7002 Series Processors. Microsoft Azure is already offering their customers industry-leading compute performance for all their workloads. After being the first global cloud provider to announce the deployment of AMD EPYC™ 7001 Series Processor based Azure Virtual Machines in 2017, Microsoft been working together with AMD and Mellanox to continue to bringing the latest computing innovation to enterprises of all size and shape. Azure Virtual Machines provide more customer choice to meet a broad range of requirements for general purpose workloads using the new AMD EPYC™ 7002 processor and Mellanox SmartNICs.
Impact on 5G, NFV and Edge Cloud
For telecommunication carriers and multi service operator companies who are looking to deploy virtualized telco cloud infrastructure to support 3GPP 5G CUPS, Network Functions Virtualization (NFV) and Multi-Access Edge Computing (MEC) workloads, having highest capacity economical compute coupled with fastest efficient network means highest performance at the lowest cost for service provider applications. Given the CapEx and OpEx reduction pressure for the service provider industry, AMD EPYC™ 7002 Series Processors and Mellanox SmartNICs combination very quickly translates to highest return on investment and fastest time to ARPU (average revenue per user).
When the Rubber Meets the Road
We decided to put an AMD EPYC™ 7002 Series Processor based server with Mellanox ConnectX-5 PCIe Gen4 SmartNICs to the test in both virtualized and bare metal OpenStack cloud environments. The OMG performance results of our telco benchmark testing are summarized below.
In bare metal server testing, we saw over 197 Million Packet Per Sec (Mpps) at 64-byte frames and over 93Gbps or just over 97% of line rate. While running at 1518-byte frames and utilizing dual ports of a ConnectX-5 with PCIe Gen4 connectivity to an AMD EPYC™ 7002 Series Processor with 16 cores, there was still ample room left for application processing with three fourths of the cores unused and available. Theoretically, with just a single socket AMD EPYC™ 7002 Series Processor 64-core system that supports 4 PCIe Gen4 slots, using Mellanox ConnectX-5 SmartNICs, one could achieve 600 Mpps packet rate or 400Gbps aggregate throughput on a single CPU server. That really is OMG performance!
In a virtualized server environment, when we compare at the ASAP2 OVS hardware offload versus OVS-DPDK testing with multi-tenant UDP traffic, ASAP2 were able to achieve 67Mpps at 114-byte frame size and 87.84% of line rate at 1518-byte frame size, all without any CPU cores required for the network load (i.e. UDP VXLAN packet processing). Whereas with OVS-DPDK for multi-tenant UDP traffic, we were only able to achieve only 6.6 Mpps for 114-byte frames, or just 33.2Gb/s and 33.2% of line rate for 1518-byte frames while still consuming 12 CPU Cores for packet processing. Thus, by utilizing ASAP², we were able to achieve up to 10X or 1000% the packet rate and 2.5X or 250% the throughput versus OVS-DPDK for overlay UDP traffic without consuming any CPU cores. Without ASAP2 technology, the massive compute capacity available in AMD EPYC™ 7002 Series Processors could remain untapped due to scarcity of high-speed network traffic. Indeed, this proves the well-known adage that faster compute needs faster networks! Mellanox SmartNICs achieve OMG performance with AMD EPYC™ 7002 Series Processors.
In our final test case, we tested OVS performance for UDP only traffic with all 12 CPU cores dedicated to the OVS running over DPDK. The graph above shows the performance results for various packet sizes and percentage of line rate traffic for the test methodology. At 64-byte frame size, OVS was able to achieve 25.8Mpps. This is an amazing performance!
With the release of industry’s first PCIe Gen4-capable X86 CPU with the AMD EPYC™ 7002 Series Processor, AMD has revolutionized the computing industry to take advantage of the massive compute capacity for all kinds of workloads. The collaboration between Mellanox and AMD has been at the heart of this sea change. Together with AMD EPYC™ 7002 Series Processors, Mellanox SmartNICs are enabling smarter, better, and faster networking without compromising the efficiency of modern cloud native data centers. Beyond the phenomenal benchmarking performance already demonstrated for HPC, storage and cloud computing workloads, Mellanox has now also validated OMG performance gained from the combination of AMD EPYC™ 7002 Series Processors and ConnectX network adapters for the telecommunications and service provider use cases.
 Based on June 8, 2018 AMD internal testing of same-architecture product ported from 14 to 7 nm technology with similar implementation flow/methodology, using performance from SGEMM. EPYC-07