Root Cause Analysis:
Extended Read-Only and Elevated Error Rates
June 19, 2017
Problem description & Impact
On Monday, June 19, 2017, at 8:34 pm PDT, Okta experienced a minor service disruption in US Cell 2 whereby a subset of admins in US Cell 2 may have experienced slightly elevated error rates. Administrators as well as integrations making API update calls would have also experienced extended Read-Only mode until the issue was fully resolved at 9:08pm PDT.
Issue occurred during planned US Cell 2 database maintenance (planned read-only with an expected maximum duration of 15 minutes. During the planned maintenance, there was a delay in processing a database migration step which triggered monitoring alerts. As the threshold for the planned read-only mode had been exceeded, Okta reverted the change and returned the service to normal at 9:08am.
Mitigation steps and future preventative measures
Okta has identified and corrected the script/process error which triggered the extended Read-Only mode and has implemented changes to prevent this issue from recurring.