Okta Workflows allows for a great deal of flexibility and freedom in flow design, which makes the tool extremely powerful. However, it is important to be cognizant of how to design the flow. Flows created without focusing on a resilient design philosophy will lead to future difficulties in troubleshooting, repair, and recovery.
At a high level, if building Workflows in ways that they can be retried or resumed after an unexpected situation, failure, or error, the impact will be mitigated. The best way to implement this varies from use case to use case, but it all centers around being able to retry/resume without extra impact.
Below, Okta gathered a few examples of resilient design practices that should be considered when designing Workflows.
These recommendations are not one-size-fits-all, but they should provide a general idea of the best practices when redesigning flows to be more resilient.
- Okta Workflows
Note: Beginning February 2025 - Customers who exceed rate limits may see longer execution times on Okta Connector cards due to stricter enforcement of Okta API rate limits.
- If a customer hits the rate limits via the Okta Connector, the underlying retry logic in the Okta Connector will now wait until the estimated time their rate limit resets. Bursts of 429 errors may result in longer execution times.
Suggested Solutions:
Iterate through smaller batches of items rather than large batches all at once
- Complex processing iterated over large batches (hundreds, thousands, or more) of records all at once means anything unexpected can cause the entire process to halt. This is an unhealthy design structure that makes it difficult to recover in case anything goes wrong.
- Instead, flow design should facilitate troubleshooting, retrying, or halting if something unexpected occurs.
For example, if the goal is for a flow to iterate through 10,000 users, split the users up into small batches and mark the users as ‘processed’ after the flow had operated on them. This way, if anything goes wrong with a batch, the other batches will still process. This also reveals which users need to be retried, since they are marked as ‘processed’ on their Okta user profile in a custom attribute.
Retrieve an updated/filtered list of items to iterate through when possible
- If the goal is to process through 100 records, but the flow hits an unexpected problem halfway through the list, it should be possible to filter that list of records down based on the ones that have already been operated on, so starting from scratch is not necessary.
For example, if there is a flow that is removing a user from a list of applications and it gets stuck halfway through the list, build the flow in a way that it retrieves an updated applications list the next time it is run for that user. This way, the flow can be manually triggered for the user in question and only operate on the applications that still need to be removed, rather than the entire original list.
Implement a combination of an event-based flow and a scheduled flow
- The event-based flow should be set up to handle everything normally, but there is a scheduled flow running occasionally (usually once a day) that will catch and recover anything that gets ‘stuck’ in the event-based flow.
For example, if there is a flow monitoring for users added to a group, the event based flow could remove the user from the group after they have been processed. If an execution gets stuck, it will not remove the user from the group. Then, overnight, a scheduled flow could retrieve any users still in the group and process those to make sure they have been processed successfully.
Concurrency values on List operations can be used with discretion to increase throughput
- When concurrency on something like a "For Each" card is set to 1, and a helper execution is sitting “In Progress,” the For Each cannot move on to the next item in the list. Increasing this a small amount can alleviate this a bit.
For example, If using a List operation with a “concurrency” setting, set the concurrency value between 2 and 5 instead of using 1. This will allow the rest of the list to continue processing through the helper flow even if one of the helper flow executions gets stuck in progress.
NOTE: This will not allow the parent execution to continue, just the other helper flow executions.
NOTE: Please be aware of rate limiting (Okta or other services) when increasing concurrency values, as multiple helper executions will occur at once.
Add WAIT FOR to help mitigate RACE type conditions
- In scenarios where workflows collaborate with other processes, it is essential to clearly define the dependencies among tasks. It is important to ensure that upstream and downstream actions are executed in the proper sequence. Failure to do so may result in an incorrect outcome. The WAIT FOR card can help mitigate the RACE type conditions.
For example:
- A tool uses Okta API calls to create new Okta users and perform some subsequent tasks.
- Okta workflow triggered by the user creation will activate the users and send out activation emails.
A problem is encountered where the workflow occasionally failed to activate users. The investigation revealed that during these failures, the new user was not in the correct status, and the process initiated by the tool remained active. To resolve this issue, place a WAIT FOR (10 seconds) as the first card in the workflow.
Related References
