Root Cause Analysis:
Okta Experienced a service disruption in US Cell 4 whereby a small number of customer tenants experienced intermittent deprovisioning or missing profile updates for some Active Directory or LDAP mastered users between 6/20/2017 and 6/27/2017. While working to resolve the issue, Okta placed US Cell 4 into a Read-Only mode on 6/27/2017 between 11:27pm and 12:10am PDT. Administrative updates via Okta Admin or the API would have been unavailable during this time. The issue was fully resolved at 6/27/2017 @ 12:10am PDT.
On 6/14/2017, Okta made an infrastructure change in US Cell 4 to revert a network address translation (NAT) configuration change within US cell 4 which was deployed to resolve a recent Workday Import issue. The NAT change prevented network time protocol traffic from flowing to US Cell 4 nodes. As a result, time synchronization among nodes responsible for the processing of AD and LDAP import data started to drift askew over time. Initially this drift in time synchronization was negligible within the US Cell 4 infrastructure, but over time as the time skew increased, some AD or LDAP imports were processed in an incomplete manner as processing is influenced by timestamps for the Active Directory and LADP objects retrieved during the import.
Mitigation step and future preventative measures:
Prior to 6/27/2017 the clock skew was small enough that only a small number of import jobs were incompletely processed. On 6/27/2017 at 3:25pm Pacific Time Okta identified that multiple customers were experiencing an AD or LDAP import issue and initiated a Service Disruption event. Following root cause determination, US Cell 4 was placed into a brief period of Read-Only mode between 6/26/2017 @ 11:27pm and 6/27/2017 @ 12:12am PDT to remediate the issue. During this period of Read-Only, the network address translation configuration was updated to ensure network time protocol traffic successfully flowed to the affected US Cell 4 nodes. Upon confirming the clock-skew was resolved across all US Cell 4 nodes, Okta’s engineering team successfully validated imports were consistently returning all expected objects and the issue was resolved.