Mellanox EDR InfiniBand accelerates the new world’s top high-performance computing (HPC) and Artificial Intelligence (AI) system, named Summit, at the Oak Ridge National Laboratory. Summit delivers 200 Petaflop performance and leverages dual EDR InfiniBand network to provide overall 200 gigabit per second throughput to each compute server. Performance matters, even for management networks. So, while Mellanox InfiniBand accelerated the front-end performance of this supercomputer, Mellanox Ethernet switches were tapped to provide the infrastructure for back-end management duties.
This Ethernet management network reliably connects all of Summit’s 4600 nodes, providing the performance necessary for carrying mission-critical management services like booting the nodes, NFS file access, LDAP, and Job Launch initiation. On the same physical network, a separate virtual network was used for less bandwidth intensives tasks like IPMI, Console, Syslog, health-checking and network time assignment/synchronization. A reliable management network is imperative for gathering important telemetry data. It is this network that is used for periodic house-keeping duties like pushing firmware upgrades and applying security patches.
“We are proud to accelerate the world’s top HPC and AI supercomputer at the Oak Ridge National Laboratory, a result of a great collaboration over the last few years between Oak Ridge National Laboratory, IBM, NVIDIA and us,” said Eyal Waldman, president and CEO of Mellanox Technologies. “Our InfiniBand smart accelerations and offload technology delivers highest HPC and AI applications performance, scalability, and robustness. InfiniBand enables organizations to maximize their data center return-on-investment and improve their total cost of ownership and, as such, it connects many of the top HPC and AI infrastructures around the world. We look forward to be part and to accelerate new scientific discoveries and advances in AI development, to be performed and enabled by Summit.”
The need to analyze growing amounts of data, to support complex simulations, to overcome performance bottlenecks and to create intelligent data algorithms requires the ability to manage and carry out computational operations on the data as it is being transferred by the data center interconnect. Mellanox InfiniBand solutions incorporate the In-Network Computing technology that performs data algorithms within the network devices, delivering ten times higher performance, and enabling the era of “data-centric” data centers. Combined with Mellanox Ethernet fabric for back-end infrastructure management delivers the most robust solution available.
“Summit HPC and AI-optimized infrastructure enables us to analyze massive amounts of data to better understand world phenomena, to enable new discoveries and to create advanced AI software,” said Buddy Bland, Program Director at Oak Ridge Leadership Computing Faculty. “InfiniBand In-Network Computing technology is a critical new technology that helps Summit achieve our scientific and research goals. We are excited to see the fruits of our collaboration with Mellanox over the last several years through the development of the In-Network Computing technology, and look forward to take advantage of it for achieving highest performance and efficiency for our applications.”