Yesterday I was forced to troubleshoot my vRA 8.4 Installation due to the fact I couldn’t log in any longer to the appliance.
As you know with version 8.0 vRealize appliance was changed to container design with kubernetes, so for troubleshooting you have to use vRA CLI and kubernetes commandline tools.
This post will show you some of the CLI commands I was using for troubleshooting, after login in with the root user and password via SSH.
My steps of troubleshooting were:
- Check Pods & Services Status
- Display vRA Cluster Status
- Verify the vRA Deployment Status
- Check Deployment Log File
- Stopping / Shutdown vRA Cluster
- Starting vRA Cluster
- vRA 8.4 Bad Gateway / Gateway Timeout
Check Pods & Service Status
kubectl -n prelude get pods
Display vRA Cluster Status
Verify the vRA Deployment Status
output of this command could be on of two variants: „Deployment not complete“ if the appliance is still deploying / starting up, or it will show „Deployment complete“
vracli status deploy
Check Deployment Log File
Logfile of the pod deployment is located as follows:
tail -f /var/log/deploy.log
Stopping / Shut down vRA Cluster
This command will shutdown vRealize Automation on all of the cluster nodes by stopping the services, sleep for 2 minutes, and clean the current deployment before shutting down the appliance. Check the official docs here for up-to-date procedures.
/opt/scripts/svc-stop.sh sleep 120 /opt/scripts/deploy.sh --onlyClean shutdown -h now
Starting vRA Cluster
Power on each of the appliances and wait for them to boot completely before proceeding. Wait for the appliance console to show the blue welcome page. Ensure that all prerequisite servers are also started such as vRealize Identity Manager (vIDM). This command will run the deploy.sh script to deploy all prelude services and then the kubectl command will show the status of all the running pods or ‘services’. This process can take 30+ minutes. If the appliance has insufficient memory, the timeout will occur at 40 minutes. Check the official docs here for up-to-date procedures.
/opt/scripts/deploy.sh kubectl -n prelude get pods
vRA 8.4 Bad Gateway / Gateway Timeout
After starting up your vRA appliances, you may find that the UI loads but shows an error of Bad Gateway or Gateway Timeout. This is usually because the appliance is still starting up. Presuming the appliance has enough resources assigned to it, the UI will eventually load and as per above, the status of the deployment can be checked using the below command. Check the READY column and confirm that all pods are ready for use. Any pod with a READY value of 0/1 means that the pod is not available yet. Once all pods are listed as 1/1 or 2/2 then the UI will be available for use.
kubectl -n prelude get pods
vRA 8.4 just not working after all steps on top
After trying all of the above, sometimes vRA just won’t come back online after a failure. If this is the case, run the command above to check the status of the pods and if they are all online except the postgres database pod, try the below command to restart the kubelet service. Once this is run, let it sit for the next 30 minutes as vRA will restart itself and try to come back online cleanly.
systemctl restart kubelet