Root Cause Analysis
SMS Delivery Delays - Aug 31st, 2017
Problem Description & Impact:
On Thursday, Aug 31, 2017 at approximately 08:35am PDT, Okta began experiencing delays in SMS message delivery for MFA, password reset and SMS factor registration. The issue predominately impacted users in US Cell 3, but a small number of SMS message delivery delays were also observed in other US Cells. Upon retry, automated failover to Okta’s secondary SMS vendor would have occurred. However, SMS delays were experienced by both vendors, and as a result, messages for a subset of customers may not have been sent. All SMS delays had fully dissipated by 10:45am PDT.
All other secondary authentication factors, such as Okta Verify, Voice-Call, Duo, Yubikey, Google Authenticator, or Security Question were not affected.
At roughly 08:35am PDT, Okta’s proactive monitoring began alerting us to a growing SMS message queue. Upon further investigation, our primary and secondary SMS vendors had experienced issues in delivering SMS messages for a subset of requests. This resulted largely in delivery delays, with some end-users not receiving any SMS message. Okta’s SMS retry logic alternated SMS text message retry request between SMS vendors until the issue was solved at the SMS provider and all SMS message delivery delays were resolved at 10:45am PDT. US Cell 3 was predominately impacted due to the primacy of the SMS providers configured within that cell.
Okta initially responded by routing the majority of requests away from the Primary SMS provider in US Cell 3, but a subset of requests were impacted by an unrelated issue with our secondary provider which we are still investigating root cause.
In response to this issue, Okta has subsequently implemented additional operational alerting on end-user SMS retry behavior. This additional alerting will provide better early warning detection and response for SMS related issues in the future.