Phone Outage
Incident Report for OIT Services
Postmortem

Problem Impact Analysis

Event Occurrence: 29 June 2022 13:40

Background

Description of the service.  Why is it important, and why do we care that it had a problem. Include environmental details which are relevant to the outage.

Campus VoIP phone system

Break Down of the Problem

Describe what happened, what is the problem we are wanting to fix.  When/how was is reported.  Actions taken to restore service.  Duration of outage and time that service was restored.  Also describe any workarounds which may have been used during outage.

During a Facilities Services scheduled power outage in the MBS complex the UPS batteries were drained causing the network equipment in that facility to go down.  When power was restored the network equipment recovered except for one fiber optic circuit to the switch which provides network connectivity to the VoIP networks DHCP server.  Since the DHCP server was not reachable, once DHCP leases for VoIP phones reached their TTL and expired, new DHCP leases were not able to be provided and the phones dropped their IP addresses and were effectively offline.  This issue was identified and resolved at approximately 14:20.

Target State / Goal 

Describe how the service should operate to provide an expected service level to the customer.

The VoIP DHCP service should be available 24/7, except during scheduled maintenance.

Root Cause Analysis 

Detailed and technical description of the problem.  Include the events which caused the outage to include failures in hardware, software, environment, processes and procedures. 

It was found that there was a faulty fiber optic module installed in the core switch in MBS, providing service to the access switch in the facility.  

Develop Countermeasures 

Describe actions to take to prevent future occurrences of this outage and improve service provisioning.

The faulty fiber optic module was replaced and the connectivity of the access switch is being reengineered to provide redundant connectivity to the campus network core in an effort to eliminate future outages.

Implementation of Countermeasures

Schedule for the implementation of proposed actions.

29 June: Faulty fiber optic module located and power cycled, restoring connectivity.

  Batteries in MBS UPS refreshed in order to extend up time.

30 June: Faulty fiber optic module replaced

Follow Up / Review

Describe follow up to ensure actions are implemented and the expectation of improvements have been met.

Circuit redundancy to be implemented ASAP to provide increased connectivity to access switch.

Posted Jul 13, 2022 - 10:14 AKDT

Resolved
This incident has been resolved. Thank you for your patience, and have a good afternoon.
Posted Jun 29, 2022 - 14:22 AKDT
Monitoring
We have implemented a fix and are monitoring the results.
Posted Jun 29, 2022 - 14:10 AKDT
Investigating
We are currently investigation multiple reports of phones no longer working and just stating 'Configuring IP'. We thank you for your patience and will post updates as soon as we find out more.
Posted Jun 29, 2022 - 13:40 AKDT