Root Cause Analysis
User Search Feature - Service Degradation
July 5, 2018
Problem Description & Impact
On Thursday, July 5th, 2018, beginning at approximately 10:55am PDT, Okta experienced a service degradation in the US Preview Cell. The issue impacted a subset of customers who have enabled the User Search early access feature within their Okta Preview tenant and made calls to the /api/v1/users?search API endpoint. Calls made during the impacted window encountered failed with HTTP 503 errors. The errors persisted until the issue was fully resolved at 11:21am PDT. End user authentication was not affected during this time.
The root cause was traced to an inconsistency in the deployment and validation choreography between our test and preview environments for the Search micro-service. This inconsistency prevented an incompatible configuration setting from being detected in the test environment, causing the Search endpoint to fail and return 503 error responses.
Mitigation Steps and Recommended Future Preventative Measures
The issue was resolved by reverting the change which caused the regression, allowing the Search micro-service to recover by 11:21am PDT. The inconsistency in the Search micro service's deployment and validation choreography has been resolved.