ESXI not responding in Vcenter: What to Do

I don’t consider an exaggeration to say that each VMware vSphere admin at least once has had one or several VMware ESXi hosts were showing as Not Responding in vCenter Server. A lot of things can cause this, so, today, we’re going to take a look at the most frequent ones.

First of all, verify that the ESXi host is in a powered ON state

It would help if you made sure it is turned on both physically in the rack and is available via remote console (iLO/iDRAC). The problem is, your host might have met the infamous PSOD (Purple Screen of Death, AKA Purple Diagnostic Screen).

If that’s the case (I hope not), you’ll have to deal with this issue according to this VMware KB article. When the host is booted, add it to vCenter Server for a second time.

Now, in case the ESXi host is powered ON but still shows as Not Responding, try restarting the Management agents (Restart Management Network)

This service is responsible for synchronizing VMware components and granting access to the ESXi host through the vCenter Server. As for the restarting the Management agents – you can look it up here.

It won’t hurt if you run Test Management Network as well. Errors that you’re likely to get might just explain what exactly went wrong:

The next step is verifying that network connectivity exists from vCenter Server to the ESXi host (both with the IP and FQDN)

Although it seems obvious, you’ll be surprised to find out how many people actually forget to do it beforehand. To do so, just initiate a ping test from your ESXi host:

Verify that you can connect from vCenter Server to your ESXi host

The tricky thing about vCenter is that the ESXi host sends heartbeats, and vCenter Server has a window of 60 seconds to receive the heartbeats. Once it doesn’t receive heartbeats from the host in 60 seconds, vCenter Servers marks this ESXi as Not Responding and eventually Disconnected.

Sometimes it isn’t working out because the ESXi host just can’t see vCenter Server behind NAT:

In such a scenario, the ESXi hosts won’t be able to connect to vCenter Server. Moreover, this configuration isn’t even supported by VMware, even though there is a workaround.

Well, if the above has happened to you, now you have to allow connections to the vCenter Server from the ESXi host via 902 (TCP/UDP) port:

You can easily test 902 port connectivity with Telnet.

Here VMware Knowledge Base will come in handy:

By the way, in the case of a congested network, you can increase the 60-second heartbeat interval to, say, 120 seconds if necessary. It’s easy: just change the config.vpxd.heartbeat.notrespondingtimeout parameter value in the vCenter Server Advanced Settings as it has been described here.

Try disconnecting your ESXi host from vCenter Server inventory and then connecting back

There’s already a tutorial explaining how to do that. Just Disconnect your ESXi host in vSphere Client:

After that, add the ESXi host to vCenter Server once more.

Add more redundancy to your IT infrastructure with vCenter HA

Learn from this video about:

3-node vCenter High Availability architecture
VMware vCenter Server configuration approaches
vCenter High Availability failover capabilities
Demonstrating vCenter High Availability with automatic failover

Watch video now

Free of Charge. No Registration.

Nothing can help you? Time for logs

As the first step, look into the vpxa (/var/log/vpxa.log) file, as it is suggested here. If the reason for the trouble is a lack of service console memory allocated for the vCenter Server agent, in the vpxa log you’ll see errors such as these:

[2007-07-28 17:57:25.416 ‘Memory checker’ 5458864 error] Current value 143700 exceeds hard limit 128000. Shutting down process.
[2007-07-28 17:57:25.420 ‘Memory checker’ 3076453280 info] Resource checker stopped.

Also verify if hostd service works and responds to commands. Look into the hostd log file (/var/log/vmware/hostd.log), as it is suggested here. For example, you can find an error such as this one:

2014-06-27T19:57:41.000Z [282DFB70 info ‘Vimsvc.ha-eventmgr’] Event 8002 : Issue detected on sg-pgh-srv2-esx10.sg-pgh.idealcloud.local in ha-datacenter: hostd detected to be non-responsive

Many things can lead to such error, but the most common reason is that you don’t have enough resources for hosted service on your host.

And last but not least: Make sure that the storage is alright

Let’s assume that you’ve already checked everything else, and it didn’t help you out. Well, now the only thing left to do is checking the storage issues on your ESXi host. All you need to know is here. In this case, the troubleshooting scheme would look like this (click):

To sum up!

If your ESXi host is in Not Responding state, try with the simplest things: verify if your host is powered on, initiate ping tests to both sides (don’t forget port 902!), restart Management agents, or reconnect your host to vCenter inventory. Furthermore, move into the more complicated issues, such as checking if the vpxa agent or hostd service is up and running. In the end, see if there are storage problems (which can be plenty).

An ESXi host shows as Not Responding in vCenter Server – Now what might be the problem?