Root Cause Analysis: Okta DNS Disruption - October 21st, 2016 Skip to main content
https://support.okta.com/help/oktaarticledetailpage?childcateg=&id=ka02a000000xa7esas&source=documentation&refurl=http%3a%2f%2fsupport.okta.com%2fhelp%2fdocumentation%2fknowledge_article%2froot-cause-analysis-okta-dns-disruption-october-21st-2016
How satisfied are you with the Okta Help Center?
Thank you for your feedback!
How satisfied are you with the Okta Help Center?
1
2
3
4
5
Very Dissatisfied
Very satisfied
Enter content less than 200 characters.
Average Rating:
Root Cause Analysis: Okta DNS Disruption - October 21st, 2016
Published: Oct 25, 2016   -   Updated: Nov 30, 2016

Root Cause Analysis: 

Okta DNS Disruption - October 21st, 2016 

 

Summary: 

On October 21st, 2016, at approximately 4:10am PDT, customers hosted in Okta's US infrastructure began experiencing intermittent connectivity, authentication, MFA, and API issues as the result of a distributed denial-of-service (DDoS) attack against Okta’s primary Domain Name Service (DNS) provider Dyn. Many customers reported a partial or complete outage in accessing the Okta service. Following Okta's remediation of the DNS provider issue, some customers continued to experience residual connectivity issues due to a server overload condition, locally cached DNS values, or connection issues to Verizon.


Event Timeline: 

Beginning at 4:10am PDT on October 21, 2016, customers hosted in Okta's US service infrastructure experienced intermittent connectivity issues when attempting to connect to the Okta service. Okta immediately began investigating the connection failures.    

At approximately 5:40am PDT, Okta identified the root cause of the issue as a distributed denial-of-service (DDoS) attack against the Domain Name Service (DNS) provider Dyn.   

At 6:12am PDT, Okta successfully migrated Oktapreview DNS records to a secondary DNS provider unaffected by the DDoS attack to validate the corrective action plan. At 6:18am PDT, Okta migrated all remaining US infrastructure DNS records to this secondary DNS provider. Following the completion of the DNS provider migration, a subset of customers continued to experience connectivity issues resulting from one of three different root causes: 

  1. Okta US Cell 1 and US Cell 2 Load Capacity 
    Okta customers hosted in Okta's US Cell 1 and US Cell 2 experienced intermittent connection errors. The connectivity errors following the DNS provider change were the result of an unusually high number of concurrent requests from customers who previously were unable to access Okta. Okta began remediating the errors and increasing server capacity to handle the unusually high load. Following remediation efforts, the error rate continuously declined until it was fully resolved at 8:45am PDT.

  1. Cached DNS Values 
    Although Okta migrated to a secondary DNS provider by 6:18am PDT, it took time for these new DNS settings to propagate throughout the internet.  This resulted in sporadic DNS resolution issues for many customers as software clients and other devices relying on cached DNS values attempted to connect to Okta. Okta worked with impacted customers and provided steps to refresh or bypass stale DNS settings in order to restore access. At 10:53am PDT, a Knowledge Base (KB) article was made available to customers, and linked from the active incident on Okta's Trust Page, to provide steps for self-service remediation of local DNS caching issues.  After contacting Okta Support or taking the actions outlined in the KB article, most customers experiencing sporadic DNS resolution issues were able to flush cache settings and successfully restore their access to Okta. However, a small number of customers continued to experience access issues until 3:00pm PDT, primarily related to Okta Agent connectivity issues Delegated Authentication and Multi Factor Authentication, largely attributed to localized DNS caching issues within customers' environments. These issues were resolved with a restart of the impacted agent.

  1. Verizon Connection Issues 
    During the same time-frame, a subset of customers also reported issues connecting to Okta from Verizon networks or devices. The issue abated at approximately 1:17pm PDT. No remediation action was required by Okta.


Future Preventative Measures: 

Ultimately, our architectural and vendor choices are our responsibility and we understand that any disruption in service can have a significant impact for our customers.  In order to protect our customers and mitigate the impact to our infrastructure in the event of a similar attack, Okta is in the process of implementing the following  preventative measures: 

  1. Improved DNS Failover: (Phase 1) Okta has expanded to 3 DNS providers and has improved the DNS provider failover process.  (COMPLETE)

  1. Eliminate DNS as a single point of failure: (Phase 2) Okta is currently exploring multiple solutions to improve DNS provider failover automation, and implement multiple primary DNS providers.
    UPDATE 10/27/2016: Okta has successfully deployed multiple primary DNS providers.  (COMPLETE)

  1. Shorten DNS TTL Setting: Okta is investigating the ability to shorten the DNS TTL setting with our DNS providers to improve DNS lookup refreshes in the event of a primary DNS provider change.
    UPDATE 11/29/2016:  Okta has implemented the lowest possible TTL setting available with our DNS providers.  (COMPLETE) 


Forward Looking Statements 

The statements contained in this article that are not purely historical are forward-looking statements, including statements regarding Okta's future operating results, long-term business prospects, future product acceptance, and expectations, beliefs, intentions or strategies regarding the future. All forward-looking statements included in this article are based upon information available to Okta as of the date hereof, and Okta assumes no obligation to update any such forward-looking statements. Forward-looking statements involve risks and uncertainties, which could cause actual results to differ materially from those projected. The forward looking product roadmap does not represent a commitment, obligation, or promise to deliver any product and is intended to only outline the general product development plans. Customers should not rely on roadmaps to make a purchasing decision.

Post a Comment