NIC monitoring and management in OpenBMC

BMC,

NIC continues to be a single point failure for our platforms and this is especially true for multi-host platforms where we have a single Multi-host NIC managed by 1 BMC. On Tioga Pass & Yosemite V2 platform, we have added a user space daemon to monitor and manage NIC through NC-SI. One of its primary responsibility is to monitor link status and performs auto recovery and remediation as needed. We have also been working with NIC vendors on adding OEM NC-SI commands and OEM AENs for our use case. In this talk I will discuss how ncisd monitors NIC status and some of the work we did with NIC vendors to help BMC managing NIC, as well as other command line tools we use in OpenBMC to debug NIC issues.

Resources: