Okta provides all customers with Standard Disaster Recovery (DR) across two regions. Each region contains an active-active-active deployment across three availability zones. When a primary region fails (all availability zones within that region fail) due to an infrastructure outage, a failover to the disaster recovery region occurs.
In Standard DR, this failover can take up to one hour to gain read-only access to core Okta services, after which users are able to authenticate to applications as needed. It can take up to 24 hours for full read-write access to be restored to core Okta services (i.e., configuration, settings, adding/removing users, changing permissions, etc.).
Customers who are sensitive to downtime can purchase the Enhanced Disaster Recovery add-on.
- Enhanced Disaster Recovery (EDR)
Enhanced DR is designed to remediate issues where the underlying cloud service provider’s infrastructure experiences compute, storage, or networking problems that impact core Okta services. Symptoms may include elevated authentication failure rates, increased latency, or HTTP error codes (e.g., 500), login page inaccessible, etc.
Enhanced Disaster Recovery does NOT provide protection against:
-
Request floods, including DoS or DDoS attacks.
-
Issues with ISV vendors and application connections.
-
Code-related issues that are affecting Okta services.
-
Bad actors deleting or modifying data.
-
Unintended configuration mistakes caused by Customer Admins or incorrectly applying Okta configurations.
In the rare event of an infrastructure related issue, failover can be triggered in three ways. Okta proactively monitors cell health and availability and will initiate a failover if an infrastructure related issue is detected at the cell level, but customers now have additional options for manual control. In all instances, Okta will communicate the status of the failover and failback via e-mail and in-product banners on the Admin UI.
-
Okta-Managed Failover (Proactive) - Okta continuously monitors service health. If Okta determines that a failover will effectively remediate a service incident, Okta will proactively failover impacted organizations from the primary region to the DR region.
-
Self-Service Failover - Customers can now use the Self-Service portal (Okta Disaster Recovery Admin app) or associated APIs to failover and failback impacted organizations without assistance from Okta. The use of Self-Service is optional, however, customers who invoke a failover via Self-Service are responsible for the failback, as Okta may not always know the reason for the customer-initiated failover.
-
Support-Assisted Failover - Customers can always contact Okta Support and request a failover of the impacted organization(s). Please create the case as a Priority 1 (P1) case. The case will be then immediately escalated to Okta Operations, who will determine whether a failover is warranted. If denied, the request is sent back to Support for further communication with the customer to determine next steps. The reason for denial is provided to both Support and the customer.
To learn more about Enhanced Disaster Recovery, please reach out to your account team.
Related References
- Frequently Asked Questions about Enhanced Disaster Recovery
- Frequently Asked Questions on Enhanced Disaster Recovery Self-Service
- Using Enhanced Disaster Recovery Self-Service
- Enhanced DR Product Document
- Enhanced DR Developer Document
