Root Cause Analysis: Feature Disruption 04/18/2017 Skip to main content
https://support.okta.com/help/oktaarticledetailpage?childcateg=&id=ka02a000000xaqfsak&source=documentation&refurl=http%3a%2f%2fsupport.okta.com%2fhelp%2fdocumentation%2fknowledge_article%2froot-cause-analysis-feature-disruption-04-18-2017
How satisfied are you with the Okta Help Center?
Thank you for your feedback!
How satisfied are you with the Okta Help Center?
1
2
3
4
5
Very Dissatisfied
Very satisfied
Enter content less than 200 characters.
Average Rating:
Root Cause Analysis: Feature Disruption 04/18/2017
Published: Apr 20, 2017   -   Updated: Apr 20, 2017
Root Cause Analysis: 
April 18, 2017 – Okta Feature Disruption (SMS Delays)

 
Problem Description & Impact: 
On Tuesday, April 18, 2017, at approximately 10:15am PDT, Okta experienced an issue in all cells whereby users experienced extended delays in receiving SMS notifications for multi-factor authentication, password reset, and SMS factor registration.  Queued SMS messages were significantly delayed, though latency improved over time until the issue was fully resolved at 11:45am PDT.  All other multi-factor authentication methods were unaffected by this disruption.

Root Cause: 
At approximately, 10:15am PDT, Okta experienced a failure with one of our redundant SMS providers due to a data center failure within that provider.  SMS retries were directed to our alternate SMS provider whereby Okta encountered an overload condition with that provider due to the significant increased traffic.

Mitigation Steps: 
Shortly after detection and assessment of the SMS delivery issue, Okta took steps to redistribute SMS traffic to a redundant SMS provider.  Rebalancing work continued over the course of the event until the SMS provider service was fully restored.

Okta is taking the following steps to prevent this issue from occurring again:

  1. Okta is re-engineering it's SMS integration to ensure that all failover providers have the same capacity.  The new integration will also enhance our load-balancing capabilities as well as improve our monitoring, and diagnostic tools. 
    ETA: by 5/31/2017

  2. Okta is adding new operational alerting base on end-user SMS retry behavior.  This additional alerting will provide better early warning detection and response for SMS related issues in the future.
    ETA: by 4/30/2017

  3. Okta will be implementing additional internal test tools for Support and Engineering staff to quickly identify and respond to SMS service provider issues.
    ETA: by 5/31/2017

Post a Comment