Root Cause Analysis - August 21 2017 - US Cell 5 Service Degradation Skip to main content
How satisfied are you with the Okta Help Center?
Thank you for your feedback!
How satisfied are you with the Okta Help Center?
Very Dissatisfied
Very satisfied
Enter content less than 200 characters.
Root Cause Analysis - August 21 2017 - US Cell 5 Service Degradation
Published: Aug 23, 2017   -   Updated: Jun 22, 2018
Root Cause Analysis:
August 21, 2017 - Service Degradation

Problem Description & Impact:
On August 21, 2017, between 9:40am and 10:30am PDT, Okta experienced a service degradation in US Cell 5 in which a small number of interactive user-sessions resulted in incomplete page loads post authentication.  Full traffic analysis during the incident window indicates approximately 1% of interactive user authentications were encountering a HTTP 404 response when loading several page elements following authentication rather than users’ Okta dashboards loading successfully.
Root Cause:
Initial traces of user authentications revealed users were being presented HTTP 404 errors on several page components including images, cascading style sheets, and JavaScript.  Investigation found that during recent routine security hardening updates to infrastructure responsible for hosting static content, the newest version of static hosted files were not deployed during the node updates.   While the majority of users were successfully served missing static content via Okta’s content delivery network, a small subset of users with traffic serviced by CDN nodes on which the static content cache had expired, were impacted.   
Mitigation steps and future preventative measures:
Immediately following root cause determination, the recently updated infrastructure was patched to include the newest versions of static content.   This action was completed and normal functionality resumed at 10:30 PDT for all user requests.  
In order to prevent future recurrence of this issue additional automated post deployment validation checks are being implemented to ensure the correct static contact resides on the source nodes.  In addition, the sensitivity of monitoring for this condition has been increased to trigger on a smaller number of 404s generated when retrieving static content