Christopher Covington is a Production Engineering working to configure, monitor, and deploy updates to Facebook's OpenBMC fleet. He previously worked at Qualcomm, enabling .rpm Linux distributions on ARMv8 servers, and teleport running Linux benchmarks (using Checkpoint Restore In Userspace, CRIU) from relatively fast QEMU system emulation to slow CPU and SoC performance models.
Facebook has been working on an open source Board Management Controller (BMC) solution since 2014. This presentation examines several specific problems discovered, as usage of the embedded Linux distribution has grown.
Out of Memory in 1 to 60 Days, or Why to Engage Upstream and Rebase Often
Two memory leaks, one in Linux v2.6 and the other in rsyslog, and how they were fixed upstream.
The Pain of Passwords, or Why to Invest In Security
Several shortcomings of passwords we have encountered, how to set up SSH Trusted CA and Authorized Principals, and password and key rotation considerations for image update and configuration mechanisms.
Unresponsive Endpoints, or Why to Architect and Test for Resilience
Communication failures observed between bootloader and BMC over IPMI, or between BMC processes over Unix socket, the code or system design changes which improved things, and how testing can screen for these issues.