Root Cause Analysis:
Problem description & Impact:
March 7, 2018
On March 7th, beginning at 9:35am PST, Okta experienced a feature disruption to API Access Management in the US Preview cell. During this event, End-users attempting to access endpoints for this feature may receive an HTTP 500 error. Okta took corrective actions to address the issue and the API Access Management feature was restored to all entitled customers in the US Preview Cell by 1:50pm PST.Root Cause:
The disruption was caused while Okta was working to correct the assignment of the API Access Management feature to certain tenants and mistakenly removed the feature from orgs authorized to have it.Mitigating Steps & Corrective Actions:
After the feature was inadvertently disabled, Okta began work to bulk re-assign the feature to all entitled tenants. However, due the complexity of the process of enabling the API Access Management feature, it took Okta longer than usual to restore the feature to entitled tenants. During the incident, while bulk enablement was processing, Okta Support and Engineering manually enabled the feature for all customers who reported the issue or were detected as being impacted through Okta’s monitoring tools.
By 10:30 AM PST, the majority of the impacted tenants had the API Access Management feature reassigned at 1:50 PM PST all entitled customers in the US Preview cell had the feature restored.
To prevent recurrence, Okta has undertaken the following actions:
- Okta has adding additional oversight to the feature deployment processes.
- Okta and will be hardening our management software to reduce risk of data input errors.