The University of Alaska receives network services from ACS. The ISP provides the long haul WAN circuits that comprise the University of Alaska’s core network from/to Fairbanks, Anchorage, Juneau, Seattle, and Portland. The circuits from Fairbanks <-> Seattle and Anchorage <-> Portland comprise the University’s connection to Internet2, AWS, and some commodity internet sites.
On October 13th around 1000, one of ACS’ fibers that supply Wide Area Network (WAN) Connectivity (to Anchorage and Juneau) and Commodity Internet (CI) Connection to the University of Alaska Fairbanks Main Campus began generating faults and errors resulting in lost and delayed packets, thereby impacting internet performance for the vast majority of users at UAF and consumers of UAF and UA Statewide Fairbanks Information Technology services (including Data Center resources).
Restore services as soon as possible.
One of the two ACS fibers supplying WAN and CI to the Fairbanks Main Campus became severely impacted resulting in poor and lost performance. Due to the nature of the issue, it was not immediately clear as to what the exact problem was initially. The high number of errors generated caused the ACS router to become overwhelmed and because of this error reporting by the device was compromised. ACS believed that the device needed a reboot to clear the reporting errors and what was assumed at the time to be a software bug in the ACS router. An emergency maintenance outage was published for Oct 14th at 0100. During this window the ACS router was rebooted, this cleared the reporting errors, but did not fix the underlying issue. Once proper error reporting was restored, ACS engineers could more clearly see the error impacting one of the campus feeding fibers. In order to stabilize services, this fiber was disabled, rerouting all traffic onto the single remaining fiber. This restored UA to proper functional status until fiber repairs could be accomplished. Full service was restored Oct 21th at 0100 when ACS and UA engineers re-enabled the repaired fiber and tested functionality.
This issue was caused by a hardware fault in ACS’ gear, there are not many countermeasures that we can take to prevent this issue from happening again. Short of provider diversification, which has historically been cost-prohibitive.