vRealize Automation 8.4 – Bad Gateway / Gateway Timeout

Yesterday I was forced to troubleshoot my vRA 8.4 Installation due to the fact I couldn’t log in any longer to the appliance.

As you know with version 8.0 vRealize appliance was changed to container design with kubernetes, so for troubleshooting you have to use vRA CLI and kubernetes commandline tools.

This post will show you some of the CLI commands I was using for troubleshooting, after login in with the root user and password via SSH.

My steps of troubleshooting were:

Check Pods & Services Status
Display vRA Cluster Status
Verify the vRA Deployment Status
Check Deployment Log File
Stopping / Shutdown vRA Cluster
Starting vRA Cluster
vRA 8.4 Bad Gateway / Gateway Timeout

Check Pods & Service Status

kubectl -n prelude get pods

Display vRA Cluster Status

vracli status

Verify the vRA Deployment Status

output of this command could be on of two variants: „Deployment not complete“ if the appliance is still deploying / starting up, or it will show „Deployment complete“

vracli status deploy

Check Deployment Log File

Logfile of the pod deployment is located as follows:

tail -f /var/log/deploy.log

Stopping / Shut down vRA Cluster

This command will shutdown vRealize Automation on all of the cluster nodes by stopping the services, sleep for 2 minutes, and clean the current deployment before shutting down the appliance. Check the official docs here for up-to-date procedures.

/opt/scripts/svc-stop.sh
sleep 120
/opt/scripts/deploy.sh --onlyClean
shutdown -h now

Starting vRA Cluster

Power on each of the appliances and wait for them to boot completely before proceeding. Wait for the appliance console to show the blue welcome page. Ensure that all prerequisite servers are also started such as vRealize Identity Manager (vIDM). This command will run the deploy.sh script to deploy all prelude services and then the kubectl command will show the status of all the running pods or ‘services’. This process can take 30+ minutes. If the appliance has insufficient memory, the timeout will occur at 40 minutes. Check the official docs here for up-to-date procedures.

/opt/scripts/deploy.sh
kubectl -n prelude get pods

vRA 8.4 Bad Gateway / Gateway Timeout

After starting up your vRA appliances, you may find that the UI loads but shows an error of Bad Gateway or Gateway Timeout. This is usually because the appliance is still starting up. Presuming the appliance has enough resources assigned to it, the UI will eventually load and as per above, the status of the deployment can be checked using the below command. Check the READY column and confirm that all pods are ready for use. Any pod with a READY value of 0/1 means that the pod is not available yet. Once all pods are listed as 1/1 or 2/2 then the UI will be available for use.

kubectl -n prelude get pods

vRA 8.4 just not working after all steps on top

After trying all of the above, sometimes vRA just won’t come back online after a failure. If this is the case, run the command above to check the status of the pods and if they are all online except the postgres database pod, try the below command to restart the kubelet service. Once this is run, let it sit for the next 30 minutes as vRA will restart itself and try to come back online cleanly.

systemctl restart kubelet

Gateway Troubleshooting vRA vRealize