Leveraging NATS and OpenTelemetry in u-bmc for Enhanced Data Center Operations

Main Room,

Historically BMC firmware always needed special care by DevOps and SREs but does that have to be the case? In this talk we will explore a way to turn your data centers BMCs into yet another software component and demystify the whole management stack of modern server hardware.

This talk will cover new design decisions made in u-bmc which aim to cover common shortcomings looking at server management not from an OxM or manufacturer perspective but rather from an end users of such stacks which often are data center DevOps and SREs. We will cover the development that happened in u-bmc since the last talk at OSFC 2022 and how NATS and OpenTelemetry, two technologies alien to average embedded systems, can play a role in current and futures data center management and monitoring at scale. While NATS is more known from high speed trading systems, deployed in Nascar telemetry systems or used by the Matrix chat servers, it showed properties that made it an ideal choice for connecting embedded systems like BMCs inside a larger installation. And while Redfish is a great standard to cover hardware management capabilities it feels very different from how regular cloud workloads are monitored. By adding OpenTelemetry monitoring capabilities into the BMC firmware it turns the BMC and therefore the servers hardware into yet another service to monitor from an SREs point of view.