VMware vCenter: Host state “not responding” flapping

This posting is ~5 years years old. You should keep this in mind. IT is a short living business. This information might be outdated.

While I was onsite at a customer to decommission an old storage system, one of my very first tasks was to unmount and detach some old datastores. No big deal, until I saw that one after one ESXi hosts went to “not responding”. Time for a heart attack but hey: Why should a host ran into a PDL/ APD, while I was dismounting datastores on the vSphere layer? The LUNs were still there and accessible. The hosts came back quickly and from that point, I watched the hosts flapping between “connected” and “not responding”. Time for an investigation. My first thought was that it must have something to do with the network. But the network was okay, no problems with interfaces, (M/R)STP or similar. Then I checked the logs and found this

and this

in the sms.log on the vCenter server.  This messages was logged multiple times per second in the Windows Server eventlog:

A short search in the Microsoft Knowledgebase pointed me to KB2577795.  Checking the number of current connections showed, that the server opens hundreds and thousands of connections to itself. Unfortunately, the installation of KB2577795 did not solved the problem. My second attempted was to rise the number of MaxUserPort which is used to limit the number of dynamic ports available to applications. Open the registry editor and locate the key:

Create a new DWORD wamed MaxUserPort and add the value 65534 to it.

After a reboot of the vCenter server, the host state flapping has stopped and and did not recur. This is NOT the solution, it’s only a dirty workaround. The customer and I decided to reinstall the virtual vCenter Server and update it to vCenter 5.5 U2. The reinstallation wasn’t a big deal due to an external database and a backup of the SSL certificates.

As I already mentioned: I don’t have a clue what happened to the vCenter server. I was able to create a workaround but I wasn’t able to solve to root cause. But the customer used this situation to make a clear cut and to start with a new vCenter server.

Follow me

Patrick Terlisten

vcloudnine.de is the personal blog of Patrick Terlisten. Patrick has a strong focus on virtualization & cloud solutions, but also storage, networking, and IT infrastructure in general. He is a fan of Lean Management and agile methods, and practices continuous improvement whereever it is possible.

Feel free to follow him on Twitter and/ or leave a comment.
Patrick Terlisten
Follow me

Leave a Reply

Your email address will not be published. Required fields are marked *

I accept!