By Elad Wind and Yuval Itkin
There’s a saying in the IT industry that when you go Cloud, everything must scale to maximum levels. Server design, power consumption, network architecture, compute density, component costs, heat dissipation and all their related aspects must be designed for efficiency when one administrator might oversee thousands of servers in a datacenter. Likewise, all the hardware and software operations must be automated — setup, installation, upgrades, replacements, repair, migration and monitoring must be automated as much as possible. Each network or server admin is responsible for so many machines, connections and VMs or containers that it’s impossible to manage or even monitor each individual system or process, such as firmware upgrades or rebooting a stuck server.
The Baseboard Management Controller (BMC) is key to automation
The BMC is a specialized chip in the center of the server handling all “inside the box” management and monitoring operations. BMCs interface with server hardware and monitor input from sensors including temperature, fan speed, and power status. They do not rely on the main operating system (OS), so a BMC can send alerts or accept commands even before the OS has loaded or in case the OS crashes. BMCs can send network alerts to the administrator or operations software. They are also used to remotely reset or power cycle a stuck system or to facilitate a firmware upgrade for a network interface card (NIC).
Traditionally BMC runs on a proprietary chip with a proprietary OS. The communication with the BMC relied on the IPMI protocol where each server platform has a different set of commands or tools, matching its specific hardware architecture.
OpenBMC – BMC software for all hardware
At the hyperscale level, architects and administrators don’t want different server brands and models to use different BMC commands or different implementations of IPMI. They might want each server to use the same BMC setup, or need to review/modify the source code for the BMC. Thus a much better solution for large-scale cloud operations is the open source Linux-based OpenBMC, a project pioneered by Facebook that standardizes the interface with BMC hardware. This was the impetus for our implementation.
Mellanox is first in the industry with OpenBMC support for NIC FW upgrades
To support large scale cloud operations, Mellanox implemented the first uniform NIC OpenBMC interface that solves the complexities around firmware upgrades. OCP server platforms based on OpenBMC can now pull each NICs’ firmware images and perform firmware updates with Mellanox NICs.
Easier management for hyperscale – Is this a big deal?
Provisioning has always been a challenge for hyperscale data centers. This has been so far addressed with vendor tools running on the servers. Hyperscalers with their massive numbers of servers and network connections are dealing with the painful inconsistency of APIs and interfaces which leads to endless effort for porting their tool sets to match the different components and drivers across their many different servers and NICs.
Bringing uniformity across devices and event/error logs is tremendously valuable – as it removes the challenges in integrating the many multi-vendor and proprietary protocols. With OpenBMC, one set of operating processes, scripts, and monitoring tools can monitor, reboot, and upgrade firmware across many different types of servers.
Automated NIC firmware upgrades on Facebook’s OpenBMC
“We are pleased with Mellanox’s implementation of ConnectX NIC support for firmware updates through the OpenBMC. The results are impressive.” said Ben Wei, software engineer from the Facebook OpenBMC development team, “The joint development was based on a draft (Work-In-Progress) DMTF spec that specifically required close, timely cooperation with Mellanox.”
Mellanox OCP NIC – First to support PLDM over RMII Base Transport
Mellanox is a pioneer in OCP server support, with our first OCP mezzanine (mezz.) 0.5 NIC, and later OCP mezz. 2.0 and now OCP NIC 3.0 are deployed with Hyperscalers, enterprises and OEMs. Mellanox is a leading contributor to the OCP system management, system security, NIC hardware and mechanics designs.
The OCP Hardware Management Project specifies interoperable manageability for OCP platforms and OCP NICs, leveraging DMTF standards to communicate into the BMC.
The NICs serve as the BMCs’ gateway to the outside world, through NCSI, MCTP, PLDM messaging layers and underlying physical transports (see illustration below). Mellanox is actively contributing to the DMTF system management MCTP and PLDM specs to enable more flexible and efficient remote reporting and management for OCP servers.
NICs serve as the BMCs’ gateway to the outside world through NCSI, MCTP, PLDM messaging layers
Standardizing “inside the box” management
The DMTF defines standard protocols for monitoring and control (DSP0248) and for firmware update (DSP0267). The DMTF declared its intention to introduce standard support for Platform Level Data Model (PLDM) protocols over RMII Based Transport (RBT) as part of NC-SI (DSP0222) revision 1.2, this is planned to be released this year, we used a Work-In-Progress published document in our implementation.
RBT is the main management interface in OCP platforms. With PLDM over RBT, OCP platforms leverage the higher throughput of RBT (over RMII) compared to MCTP over SMBus. RBT also enjoys higher popularity compared to MCTP over PCIe, and RBT also offers higher availability as the PCIe interface is not available when systems are placed into low-power states such as S3, but RBT can still operating in such low-power states. And of course RBT can operate over Ethernet.
Initially, PLDM protocols could only be supported over MCTP transport. By introducing support for PLDM protocols over RBT – PLDM protocols can now be used in standard OCP platforms. Mellanox supports this new ability to send PLDM information over RBT.
How is Mellanox keeping management and firmware updates secured?
The ability to automate firmware updates over the network using the BMC often raises security concerns. How does one prevent unauthorized firmware from sneaking into a NIC? Fortunately, Mellanox products support Secure Firmware Update. This assures that only properly-signed firmware images, created by Mellanox, can be programmed into Mellanox devices. Secure Firmware Update prevents malicious firmware from being loaded onto Mellanox NICs and is fully supported even when using PLDM over RBT.
Next steps in system security
The next steps being defined in DMTF and OCP will enable system security protocols for device authentication:
- Device identification and authentication
- Providing firmware measurement to platform RoT
- Securing management protocols (future)
These new standards, along with Mellanox’s plans to support them, will bring further automation and standardization to large-scale datacenter operations.
That’s a lot of acronyms…
DMTF – Distributed Management Task Force
For over a decade of the DMTF has been creating open manageability standards to improve the interoperable management of information technologies in servers and desktops.
PMCI – Platform Management Components Intercommunications
Is the DMTF working group that develops standards to address “inside the box” communication interfaces
PLDM – Platform Level Data Model
A data model for efficient communications between platform management components. Allows access to low-level platform inventory, monitoring, and control functions including firmware update.
MCTP – Management Component Transport Protocol
A transport-layer protocol that allows using the plurality of “inside the box” management protocols over physical interfaces other than RBT, such as SMBus and PCIe.
Read more:
- Mellanox support for the Open Compute Project (OCP)
- Mellanox Ethernet adapters overview
- Blog: Mellanox 200G cables, transceivers shown at OCP 2019
- Mellanox BlueField SoC supports OpenBMC
- Facebook blog: Introducing “OpenBMC”: an open software framework for next-generation system management by Tian Fang
- Facebook blog:OpenBMC: One board management software for all hardware at Facebook by Jubin Mehta