The Admin UI pushes the changes to the Admin node, which then gets sent to all the workers in a cluster. This article describes an edge case when the changes pushed from UI do not get reflected when users access the applications. NGINX service does not report any issue.
- Okta Access Gateway(OAG)
- Okta Access Gateway High Availability
- incrond
When changes are pushed through UI, the Admin node will get an event to create a specific configuration. These events are processed by "incrond" service followed by a reload on nginx on Admin node. After the successful processing of events, those will be pushed to all workers where incrond will again generate the nginx specific configuration followed by a reload.
If the "incrond" service is not running, then changes will not be reflected and nginx will keep on serving the last successful configuration. In case the issue hits the admin node then all the workers will not get the updated configurations because of the current flow.
The service status can be validated by:
sudo systemctl status incrond
The following command can be used to validate if the node has stuck events:
-
UI pushed the events in "/opt/oag/events" so the location can be checked if it has any unprocessed events. In case of no issue, the location will be empty as the events usually process within a couple of seconds:
-
sudo ls -ltr /opt/oag/events
-
-
After successful processing of events, event files will be moved to "/opt/oag/events_processed". This location can be checked to see the last time a sync was successful :
-
sudo ls -ltr /opt/oag/events_processed | tail
-
One potential reason could be the service stopped at the time of upgrade OR after reboot. The below commands can be used to compare if the service was stopped after the upgrade:
-
Check the timestamp when the yumUpdateOutput.log was written to see when the upgrade completed
-
sudo ls -l /opt/oag/upgrades/current/yumUpdateOutput.log
-
-
Use the timestamp above to look in the cron log file for any errors:
-
sudo less /var/log/cron
-
- Validate the service status with the command below:
sudo systemctl status incrond - If the service is stopped, proceed with the steps below:
- Move the events: This is required so there are not any old events affecting the configs when they get processed.
- Make a folder to hold the old events:
sudo mkdir /tmp/old_events - Move the old events to the newly created folder:
sudo mv /opt/oag/events/* /tmp/old_events - Validate that the old events have been moved and reside in the backup location:
sudo ls -ltr /opt/oag/events sudo ls -lrth /tmp/old_events - The service can be started after making sure there are no events in "/opt/oag/events":
sudo systemctl start incrond - Resend the changes from UI. This can be done by just editing and saving the application. Since UI already had the changed config, it will generate a new event and send it to the Admin node to process further. A check can be done again on "/opt/oag/events" to make sure the events are not getting stuck anymore.
sudo ls -ltr /opt/oag/events
- Make a folder to hold the old events:
- Move the events: This is required so there are not any old events affecting the configs when they get processed.
Refer to Okta Access Gateway High Availability options to check the status of all the nodes after the sync. In case the issue was happening on a worker node, then sync from admin can be done to get all the events from admin.
