Root Cause Analysis
May 31st, 2018
Problem Description & Impact
On May 31st, beginning at 12:13am PDT, Okta experienced a broad service disruption to US Cell 6. All external attempts to connect to both the application and its underlying cloud computing infrastructure were unsuccessful. All Administration, End-User, and API use-cases were impacted until the issue confirmed resolved at 12:45am PDT.
Okta worked with our cloud computing and networking infrastructure provider to investigate the root cause and determined our provider experienced a significant infrastructure outage which impacted service across all availability zones for US Cell 6 resulting in the service disruption.
Okta’s monitoring immediately caught and alerted us to the service disruption at 12:13am PST. Once confirmed we immediately began implementing our Disaster Recovery plan for US Cell 6. However, before the Disaster Recovery failover process was completed, network connectivity was restored and by 12:46am PDT., automated system recovery processes had returned US Cell6 to normal operations
Following the service disruption, Okta conducted a full RCA investigation with our cloud computing and infrastructure provider whom identified the underlying root cause and has implemented corrective measures to prevent this issue from re-occurring in the future.