Root Cause Analysis - Service Disruption - 05/06/2018 Skip to main content
How satisfied are you with the Okta Help Center?
Thank you for your feedback!
How satisfied are you with the Okta Help Center?
Very Dissatisfied
Very satisfied
Enter content less than 200 characters.
Root Cause Analysis - Service Disruption - 05/06/2018
Published: May 10, 2018   -   Updated: Jun 22, 2018

Root Cause Analysis:
Service Degradation
May 6, 2018


Problem Description & Impact

On Sunday, May 6th, 2018, beginning at approximately 7:03pm PDT, Okta experienced a service degradation in US Cell 2 whereby admins in US Cell 2 may have experienced slightly elevated error rates.  
Administrators as well as integrations making API update calls would have also experienced extended Read-Only mode until the issue was fully resolved at 7:45pm PDT. End user authentication was not affected during this time.

Root Cause

The service degradation was the result of a hardware failure in the primary database infrastructure. The Read-Only mode occurred during the database primacy change as a function of our fail-over to the secondary database-tier.

Mitigation Steps and Recommended Future Preventative Measures

At approximately 7:03pm PDT, Okta’s proactive monitoring alerted to Read-Only mode operation in the US Cell 2. Okta operations team responded to the problem and took quick actions to route traffic back to primary database infrastructure. The authentication requests which were in-flight during this time were always successful.  To prevent this issue from re-occurring in the future, Okta worked with Amazon Web Services to identify and mitigate the affected hardware infrastructure components.  Okta is looking to review/enhance the recovery procedures to minimize impact when such failures are encountered.