Root Cause Analysis - User Unexpectedly Deactivated And Attribute Updates Not Occurring On Import - 06/20/2017 - 06/27/2017 Skip to main content
How satisfied are you with the Okta Help Center?
Thank you for your feedback!
How satisfied are you with the Okta Help Center?
Very Dissatisfied
Very satisfied
Enter content less than 200 characters.
Root Cause Analysis - User Unexpectedly Deactivated And Attribute Updates Not Occurring On Import - 06/20/2017 - 06/27/2017
Published: Jun 29, 2017   -   Updated: Jun 22, 2018

Root Cause Analysis:
Users unexpectedly deactivated and attribute updates not occurring on import


Problem Description:

Okta Experienced a service disruption in US Cell 4 whereby a small number of customer tenants experienced intermittent deprovisioning or missing profile updates for some Active Directory or LDAP mastered users between 6/20/2017 and 6/27/2017.  While working to resolve the issue, Okta placed US Cell 4 into a Read-Only mode on 6/27/2017 between 11:27pm and 12:10am PDT. Administrative updates via Okta Admin or the API would have been unavailable during this time.  The issue was fully resolved at 6/27/2017 @ 12:10am PDT.

Root Cause:

On 6/14/2017, Okta made an infrastructure change in US Cell 4 to revert a network address translation (NAT) configuration change within US cell 4 which was deployed to resolve a recent Workday Import issue.  The NAT change prevented network time protocol traffic from flowing to US Cell 4 nodes.   As a result, time synchronization among nodes responsible for the processing of AD and LDAP import data started to drift askew over time.  Initially this drift in time synchronization was negligible within the US Cell 4 infrastructure, but over time as the time skew increased, some AD or LDAP imports were processed in an incomplete manner as processing is influenced by timestamps for the Active Directory and LADP objects retrieved during the import.  
Monitoring intended to capture clock-skew anomalies within the US Cell 4 infrastructure was misconfigured and was incorrectly reporting an acceptable clock-skew.

Mitigation step and future preventative measures:

Prior to 6/27/2017 the clock skew was small enough that only a small number of import jobs were incompletely processed.  On 6/27/2017 at 3:25pm Pacific Time Okta identified that multiple customers were experiencing an AD or LDAP import issue and initiated a Service Disruption event.  Following root cause determination, US Cell 4 was placed into a brief period of Read-Only mode between 6/26/2017 @ 11:27pm and 6/27/2017 @ 12:12am PDT to remediate the issue.   During this period of Read-Only, the network address translation configuration was updated to ensure network time protocol traffic successfully flowed to the affected US Cell 4 nodes.  Upon confirming the clock-skew was resolved across all US Cell 4 nodes, Okta’s engineering team successfully validated imports were consistently returning all expected objects and the issue was resolved.
In addition to resolving the root cause of this issue, Okta is implementing the following changes to prevent similar issues in the future:

  1. Okta will scope and implement an update to the AD/LDAP import process to make the synchronization logic resilient to clock skew.
  2. Okta has corrected the misconfigured clock-skew monitoring and added a secondary monitoring service to increase redundancy