By Elad Wind and Yuval Itkin
There’s a saying in the IT industry that when you go Cloud, everything must scale to maximum levels. Server design, power consumption, network architecture, compute density, component costs, heat dissipation and all their related aspects must be designed for efficiency when one administrator might oversee thousands of servers in a datacenter. Likewise, all the hardware and software operations must be automated — setup, installation, upgrades, replacements, repair, migration and monitoring must be automated as much as possible. Each network or server admin is responsible for so many machines, connections and VMs or containers that it’s impossible to manage or even monitor each individual system or process, such as firmware upgrades or rebooting a stuck server.
The BMC is a specialized chip in the center of the server handling all “inside the box” management and monitoring operations. BMCs interface with server hardware and monitor input from sensors including temperature, fan speed, and power status. They do not rely on the main operating system (OS), so a BMC can send alerts or accept commands even before the OS has loaded or in case the OS crashes. BMCs can send network alerts to the administrator or operations software. They are also used to remotely reset or power cycle a stuck system or to facilitate a firmware upgrade for a network interface card (NIC).
Traditionally BMC runs on a proprietary chip with a proprietary OS. The communication with the BMC relied on the IPMI protocol where each server platform has a different set of commands or tools, matching its specific hardware architecture.
At the hyperscale level, architects and administrators don’t want different server brands and models to use different BMC commands or different implementations of IPMI. They might want each server to use the same BMC setup, or need to review/modify the source code for the BMC. Thus a much better solution for large-scale cloud operations is the open source Linux-based OpenBMC, a project pioneered by Facebook that standardizes the interface with BMC hardware. This was the impetus for our implementation.
To support large scale cloud operations, Mellanox implemented the first uniform NIC OpenBMC interface that solves the complexities around firmware upgrades. OCP server platforms based on OpenBMC can now pull each NICs’ firmware images and perform firmware updates with Mellanox NICs.
Provisioning has always been a challenge for hyperscale data centers. This has been so far addressed with vendor tools running on the servers. Hyperscalers with their massive numbers of servers and network connections are dealing with the painful inconsistency of APIs and interfaces which leads to endless effort for porting their tool sets to match the different components and drivers across their many different servers and NICs.
Bringing uniformity across devices and event/error logs is tremendously valuable – as it removes the challenges in integrating the many multi-vendor and proprietary protocols. With OpenBMC, one set of operating processes, scripts, and monitoring tools can monitor, reboot, and upgrade firmware across many different types of servers.
“We are pleased with Mellanox’s implementation of ConnectX NIC support for firmware updates through the OpenBMC. The results are impressive.” said Ben Wei, software engineer from the Facebook OpenBMC development team, “The joint development was based on a draft (Work-In-Progress) DMTF spec that specifically required close, timely cooperation with Mellanox.”
Mellanox is a pioneer in OCP server support, with our first OCP mezzanine (mezz.) 0.5 NIC, and later OCP mezz. 2.0 and now OCP NIC 3.0 are deployed with Hyperscalers, enterprises and OEMs. Mellanox is a leading contributor to the OCP system management, system security, NIC hardware and mechanics designs.
The OCP Hardware Management Project specifies interoperable manageability for OCP platforms and OCP NICs, leveraging DMTF standards to communicate into the BMC.
The NICs serve as the BMCs’ gateway to the outside world, through NCSI, MCTP, PLDM messaging layers and underlying physical transports (see illustration below). Mellanox is actively contributing to the DMTF system management MCTP and PLDM specs to enable more flexible and efficient remote reporting and management for OCP servers.
NICs serve as the BMCs’ gateway to the outside world through NCSI, MCTP, PLDM messaging layers
The DMTF defines standard protocols for monitoring and control (DSP0248) and for firmware update (DSP0267). The DMTF declared its intention to introduce standard support for Platform Level Data Model (PLDM) protocols over RMII Based Transport (RBT) as part of NC-SI (DSP0222) revision 1.2, this is planned to be released this year, we used a Work-In-Progress published document in our implementation.
RBT is the main management interface in OCP platforms. With PLDM over RBT, OCP platforms leverage the higher throughput of RBT (over RMII) compared to MCTP over SMBus. RBT also enjoys higher popularity compared to MCTP over PCIe, and RBT also offers higher availability as the PCIe interface is not available when systems are placed into low-power states such as S3, but RBT can still operating in such low-power states. And of course RBT can operate over Ethernet.
Initially, PLDM protocols could only be supported over MCTP transport. By introducing support for PLDM protocols over RBT – PLDM protocols can now be used in standard OCP platforms. Mellanox supports this new ability to send PLDM information over RBT.
The ability to automate firmware updates over the network using the BMC often raises security concerns. How does one prevent unauthorized firmware from sneaking into a NIC? Fortunately, Mellanox products support Secure Firmware Update. This assures that only properly-signed firmware images, created by Mellanox, can be programmed into Mellanox devices. Secure Firmware Update prevents malicious firmware from being loaded onto Mellanox NICs and is fully supported even when using PLDM over RBT.
The next steps being defined in DMTF and OCP will enable system security protocols for device authentication:
These new standards, along with Mellanox’s plans to support them, will bring further automation and standardization to large-scale datacenter operations.
For over a decade of the DMTF has been creating open manageability standards to improve the interoperable management of information technologies in servers and desktops.
PMCI – Platform Management Components Intercommunications
Is the DMTF working group that develops standards to address “inside the box” communication interfaces
PLDM – Platform Level Data Model
A data model for efficient communications between platform management components. Allows access to low-level platform inventory, monitoring, and control functions including firmware update.
MCTP – Management Component Transport Protocol
A transport-layer protocol that allows using the plurality of “inside the box” management protocols over physical interfaces other than RBT, such as SMBus and PCIe.