Wednesday, February 1, 2012

Vmware Vsphere Host(s) suddenly show as disconnected


I had an issue where my vsphere hosts would suddenly appear as disconnected in vcenter.  I am running vsphere 5 esxi on cisco C200-m2 servers.  Every guest that was running on that host would show as disconnected as well.  This was a ticking time bomb, because if I left it in this scenario the guests would eventually crash and the host would stop responding all together as well.  The heavier the load on the server the quicker this would occur.  In my environment I have 5 servers, and with an even load, my server would crash approximately every 5-7 days, and each time it was a different host.  When I put 2 in maintenance mode to do some troubleshooting it crashed in <36 hours.  This was a really painful issue because I would have to run around the floor letting users know to save their work as I would have to run a power cycle from the remote access card which would hard crash all of the desktops and servers running on this server.  Keep in mind that I am 99% virtual here, desktops and servers, so this was very painful.  The only way I was aware that there was an issuu was by seeing the disconnected state while in vcenter, or if a user rebooted their vdi it wouldn’t come back online, or the guests would eventually crash triggering an alert.  The monitors that we have in place were unable to detect this scenario to lete me know that it was in this zombie state.  Vmware and cisco worked on this issue for a few weeks, cisco pointed me to the following KB from vmware three weeks ago, http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1030265

I followed the powercli method (must have missed the console method by accident) and the issue still occurred.  After a few weeks of working with vmware it was determined that the powercli method was written incorrectly and the console method was written correctly.  After running the console method the problem has been resolved, and vmware also just updated the article to correct the powercli method.

No comments:

Post a Comment