I don’t consider an exaggeration to say that each VMware vSphere admin at least once has had one or several VMware ESXi hosts were showing as Not Responding in vCenter Server. A lot of things can cause this, so, today, we’re going to take a look at the most frequent ones.

VMware vSphere admin

First of all, verify that the ESXi host is in a powered ON state

It would help if you made sure it is turned on both physically in the rack and is available via remote console (iLO/iDRAC). The problem is, your host might have met the infamous PSOD (Purple Screen of Death, AKA Purple Diagnostic Screen).

AKA Purple Diagnostic Screen

If that’s the case (I hope not), you’ll have to deal with this issue according to this VMware KB article. When the host is booted, add it to vCenter Server for a second time.

Now, in case the ESXi host is powered ON but still shows as Not Responding, try restarting the Management agents (Restart Management Network)

This service is responsible for synchronizing VMware components and granting access to the ESXi host through the vCenter Server. As for the restarting the Management agents – you can look it up here.

Restarting the Management agent

It won’t hurt if you run Test Management Network as well. Errors that you’re likely to get might just explain what exactly went wrong:

Test Management Network

The next step is verifying that network connectivity exists from vCenter Server to the ESXi host (both with the IP and FQDN)

Although it seems obvious, you’ll be surprised to find out how many people actually forget to do it beforehand. To do so, just initiate a ping test from your ESXi host:

Verify that you can connect from vCenter Server to your ESXi host

The tricky thing about vCenter is that the ESXi host sends heartbeats, and vCenter Server has a window of 60 seconds to receive the heartbeats. Once it doesn’t receive heartbeats from the host in 60 seconds, vCenter Servers marks this ESXi as Not Responding and eventually Disconnected.

Sometimes it isn’t working out because the ESXi host just can’t see vCenter Server behind NAT:

NAT LAN WAN

In such a scenario, the ESXi hosts won’t be able to connect to vCenter Server. Moreover, this configuration isn’t even supported by VMware, even though there is a workaround.

Well, if the above has happened to you, now you have to allow connections to the vCenter Server from the ESXi host via 902 (TCP/UDP) port:

ESXi host via 902 (TCP/UDP) port

You can easily test 902 port connectivity with Telnet.

Here VMware Knowledge Base will come in handy:

By the way, in the case of a congested network, you can increase the 60-second heartbeat interval to, say, 120 seconds if necessary. It’s easy: just change the config.vpxd.heartbeat.notrespondingtimeout parameter value in the vCenter Server Advanced Settings as it has been described here.

notrespondingtimeout parameter

Try disconnecting your ESXi host from vCenter Server inventory and then connecting back

There’s already a tutorial explaining how to do that. Just Disconnect your ESXi host in vSphere Client:

https://vmgu.ru/content_images/vmware-esxi-not-responding-5.png

After that, add the ESXi host to vCenter Server once more.

Nothing can help you? Time for logs

As the first step, look into the vpxa (/var/log/vpxa.log) file, as it is suggested here. If the reason for the trouble is a lack of service console memory allocated for the vCenter Server agent, in the vpxa log you’ll see errors such as these:

[2007-07-28 17:57:25.416 ‘Memory checker’ 5458864 error] Current value 143700 exceeds hard limit 128000. Shutting down process.
[2007-07-28 17:57:25.420 ‘Memory checker’ 3076453280 info] Resource checker stopped.

Also verify if hostd service works and responds to commands. Look into the hostd log file (/var/log/vmware/hostd.log), as it is suggested here. For example, you can find an error such as this one:

2014-06-27T19:57:41.000Z [282DFB70 info ‘Vimsvc.ha-eventmgr’] Event 8002 : Issue detected on sg-pgh-srv2-esx10.sg-pgh.idealcloud.local in ha-datacenter: hostd detected to be non-responsive

Many things can lead to such error, but the most common reason is that you don’t have enough resources for hosted service on your host.

VSAN from StarWind eliminates any need for physical shared storage just by mirroring internal flash and storage resources between hypervisor servers. Furthermore, the solution can be run on the off-the-shelf hardware. Such design allows VSAN from StarWind to not only achieve high performance and efficient hardware utilization but also reduce operational and capital expenses.

Learn more about ➡ VSAN from StarWind

And last but not least: Make sure that the storage is alright

Let’s assume that you’ve already checked everything else, and it didn’t help you out. Well, now the only thing left to do is checking the storage issues on your ESXi host. All you need to know is here. In this case, the troubleshooting scheme would look like this (click):

Scheme

To sum up!

If your ESXi host is in Not Responding state, try with the simplest things: verify if your host is powered on, initiate ping tests to both sides (don’t forget port 902!), restart Management agents, or reconnect your host to vCenter inventory. Furthermore, move into the more complicated issues, such as checking if the vpxa agent or hostd service is up and running. In the end, see if there are storage problems (which can be plenty).

Views All Time
42
Views Today
88
Appreciate how useful this article was to you?
1 vote, average: 5.00 out of 51 vote, average: 5.00 out of 51 vote, average: 5.00 out of 51 vote, average: 5.00 out of 51 vote, average: 5.00 out of 5
5 out of 5, based on 1 review
Loading...
Back to blog
The following two tabs change content below.
Alex Samoylenko
Alex Samoylenko
Virtualization technology professional. 10 years ago he built #1 website on virtualization in Russia. Alex runs his own virtualization-focused company VMC. He is a CEO of a mobile game publisher Nova Games and a CEO of an international dating site