When configuring high availability or during sync from admin, the following generic error can be noticed on the worker node if the configuration file(s) cannot be copied from admin. This article provides steps to find the underlying issue:
Failed to synchronize events from admin.
FAILURE RC =2
- Okta Access Gateway (OAG)
- High Availability (HA)
- Secure Sockets Layer (SSL)
After the addition of the node in the cluster during HA configuration, the new worker will pull all the application configuration files along with the certificates and keys. In case of any issues while copying these files , the console will log a generic error "FAILURE RC=2".
One possibility of encountering this issue* is on versions prior to 2023.11, where the symbolic link, created for multiple SANs, was not getting removed when a certificate was deleted. As a result, the symbolic link will not have the referred file in place and cannot be accessed when the worker tries to pull the configuration. The issue can also be seen during sync from admin.
Please refer to OKTA-641512 in the OAG release notes.
The following steps can be used to troubleshoot the underlying issue:
- Connect through the shell on the affected worker node.
- Switch to root:
sudo su - - Execute the command below to trigger the sync step manually. Make sure to add the admin node hostname:
/opt/oag/scripts/oagHAsetup.sh -s -m <admin-node-hostname>- Below is an example output when its working as expected**. The output will be used to help troubleshoot which file is causing an issue with the sync:
worker3.support.lab;/root# /opt/oag/scripts/oagHAsetup.sh -s -m oag.support.lab Starting up connection to admin.. Generating list of configurations on admin.. Pulling spgw config from admin: spgw.json sleeping for 5 secs to process configuration Pulling auth config from admin: auth.82db97a4-170f-47fd-8f73-0f49c78fd4de.json Pulling auth config from admin: store.auth.json Pulling idp config from admin: idp.16e34e7d-010d-4ecd-8a62-1974b5892aaa.json Pulling krb config from admin: krb5.67137634-6556-4448-867d-f4d27b20dd0e.json Pulling loglevel config from admin: loglevel.local.json Pulling store config from admin: store.spgw.json <truncated> Pulling ebsssoagent config from admin: ebsssoagent.a7a56e33-21a7-4f72-8611-4ddb0a1ca499.json Pulling app config from admin: app.2779a889-0ca3-4e86-9c0b-afd863fb9e9b.json <truncated> Pulling app config from admin: app.c2e20d08-b1aa-4d66-9d97-989ba6d20017.json chgrp: cannot access '/opt/oag/configs/simpleSAMLphp/config/cert/*.key': No such file or directory Unit /etc/systemd/system/oag-admin.service is masked, ignoring.
- Below is an example output when its working as expected**. The output will be used to help troubleshoot which file is causing an issue with the sync:
-
- For the issue described in the cause section following will be the result from the script:
worker3.support.lab;/root# /opt/oag/scripts/oagHAsetup.sh -s -m oag.support.lab Starting up connection to admin.. Generating list of configurations on admin.. Pulling spgw config from admin: spgw.json sleeping for 5 secs to process configuration <truncated> Pulling ebsssoagent config from admin: ebsssoagent.a7a56e33-21a7-4f72-8611-4ddb0a1ca499.json Pulling app config from admin: app.2779a889-0ca3-4e86-9c0b-afd863fb9e9b.json Unable to get SSL key/cert from host: oag.support.lab Exiting.- The error, Unable to get SSL key/cert, shows that it cannot get the certificate and key files from admin node. To troubleshoot further the certificate directory /opt/oag/nginx/ssl needs to be checked on admin node after connecting through shell. Any dead link will show as a blinking red entry that can be removed safely by using the below command. Please do not delete any other file than the red symbolic link.
sudo rm <link> - Try sync from admin again from the Worker node.
- The error, Unable to get SSL key/cert, shows that it cannot get the certificate and key files from admin node. To troubleshoot further the certificate directory /opt/oag/nginx/ssl needs to be checked on admin node after connecting through shell. Any dead link will show as a blinking red entry that can be removed safely by using the below command. Please do not delete any other file than the red symbolic link.
- For the issue described in the cause section following will be the result from the script:
**chgrp error can be ignored. It will be handled in future releases and does not have any impact on the sync flow.
