Root Cause Analysis - SMS Service Degradation - 10/19/2017 Skip to main content
https://support.okta.com/help/oktaarticledetailpage?childcateg=&id=ka02a0000005umesay&source=documentation&refurl=http%3a%2f%2fsupport.okta.com%2fhelp%2fdocumentation%2fknowledge_article%2froot-cause-analysis-sms-service-degradation-10-19-2017
How satisfied are you with the Okta Help Center?
Thank you for your feedback!
How satisfied are you with the Okta Help Center?
1
2
3
4
5
Very Dissatisfied
Very satisfied
Enter content less than 200 characters.
Average Rating:
Root Cause Analysis - SMS Service Degradation - 10/19/2017
Published: Oct 23, 2017   -   Updated: Oct 23, 2017

Root Cause Analysis:
SMS Service Degradation
October 19, 2017

Problem description & Impact:
On Thursday, October 19th, 2017, between 7:13am to 7:43am PDT and 8:22am to 9:30am PDT, Okta observed a noticeable increase in the volume of SMS retries across all Cells except US Cell 3. As such, customers using SMS for Multifactor Authentication or Account Recovery may not have received the SMS message and would have been prompted to retry. Upon retry, automated fail-over to Okta’s other SMS vendor occurred and SMS responses were successfully delivered with minimized impact. All SMS issues were fully dissipated by 10:17am PDT.

All other secondary authentication factors, such as Okta Verify, Voice-Call, Duo, Yubikey, Google Authenticator, or Security Question were not affected.

Root Cause:
Okta’s proactive monitoring began alerting us to a growing SMS message retry queue. Upon further investigation, Okta identified that our primary SMS vendor had experienced outage which caused sporadic issues in delivering SMS messages for a subset of requests. This resulted largely in delivery delays, with some end-users not receiving any SMS message.  US Cell 3 was not impacted as an alternate SMS provider was already in place and was unaffected by the SMS provider outage.

Mitigating Steps & Corrective Actions:
Okta’s existing SMS logic routes all retry request to an alternate SMS provider which allowed users experiencing delays to successfully authenticate with SMS upon retry. 

Okta initially responded to the problem by routing SMS requests away from the previous SMS provider to the Secondary SMS provider for all affected Cells. This helped in mitigating the problem and resolving the SMS message delivery problems.

In response to this issue, Okta has subsequently lowered SMS retry threshold volumes for operational alerting on end-user SMS retry behavior. This will provide better early warning detection and faster switching from Primary to Secondary SMS providers in such external events.
 

Post a Comment