Sam worked for 10 years as an embedded software engineer in the Precision Agriculture industry, writing networking stack and application code for ISO11783/J1939 conforming devices. She recently started working as a Production Engineer at Facebook where she is learning to apply her knowledge of embedded Linux systems to the many “at scale” problems of managing a fleet of embedded Linux devices in a production network.
Facebook has been working on an open source Board Management Controller (BMC) solution since 2014. This presentation examines several specific problems discovered, as usage of the embedded Linux distribution has grown.
Out of Memory in 1 to 60 Days, or Why to Engage Upstream and Rebase Often
Two memory leaks, one in Linux v2.6 and the other in rsyslog, and how they were fixed upstream.
The Pain of Passwords, or Why to Invest In Security
Several shortcomings of passwords we have encountered, how to set up SSH Trusted CA and Authorized Principals, and password and key rotation considerations for image update and configuration mechanisms.
Unresponsive Endpoints, or Why to Architect and Test for Resilience
Communication failures observed between bootloader and BMC over IPMI, or between BMC processes over Unix socket, the code or system design changes which improved things, and how testing can screen for these issues.