Troubleshooting Adding an ESX Server Host to a VMware High Availability Cluster

Tuesday, March 3, 2009

Troubleshooting Adding an ESX Server Host to a VMware High Availability Cluster
KB Article 1001596
Updated Jan. 23, 2009
Products
VMware ESX
VMware VirtualCenter
Details
You are unable to add an ESX Server to a cluster in the VMware High Availability (HA) configuration . This article provides you with steps to:
  • Troubleshoot an ESX Server that cannot be added to a cluster
  • Troubleshoot VMware HA configuration errors that are reported on the cluster and doing a Reconfigure for VMware HA has not resolved the error
There are a variety of error messages that are related to VMware HA configuration problems. For example, when VMware HA fails to start you receive the following error:

gethostbyname error:2
 
Note: This document assumes that you have already verified that you have enough licenses for VMware HA (and VMware DRS if it has been configured) for the ESX Server that you are trying to add to the cluster.

Solution
Validate that each troubleshooting step below is true for your environment. Each step provides instructions or a link to a document, in order to eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.   

Step 1 – Creating a new cluster for testing

  1. Log in to the Virtual Infrastructure client with a user that has administrative rights.
  2. Right-click on your datacenter object.
  3. Click New Cluster.
  4. In the New Cluster Wizard, specify a Name for the cluster.
  5. Select VMware HA.
  6. Click Next.
  7. Click Next.
  8. If you are using VMware VirtualCenter 2.5.x or above select the option to Store the swapfile in the same directory as the virtual machine, otherwise proceed to step 10 of this procedure.
  9. Click Next.
  10. Click Finish.
  11. Attempt to add the host back to VirtualCenter.

Step 2 – Reinstalling the VMware HA components

To reinstall the VMware HA components on VirtualCenter 2.0.x:

  1. Remove the ESX Server from VirtualCenter
  2. After the host does not belong to VirtualCenter, open a connection to the ESX Server service console.
  3. Type rpm -qa | grep -i lgto .
  4. This returns two packages that are named similar to LGTOaama-#.#.#-# and LGTOaamvm-#.#.#-# .
  5. Remove these packages using rpm -e followed by the name of one of the returned packages.
  6. Repeat the process for the other file.
  7. Type rpm -qa | grep -i vpxa . A package named VMware-vpxa-#.#.#-##### is returned.
  8. Remove this package using rpm -e followed by the name of one of the returned package.
  9. Test adding the host to the newly created cluster to see if this has resolved the issue.

To reinstall the VMware HA components on VirtualCenter 2.5.x:

  1. Remove the ESX Server from VirtualCenter
  2. After the host does not belong to VirtualCenter, open a connection to the ESX Server service console.
  3. Type rpm -qa | grep -i aam .
  4. This returns two packages that are named similar to VMware-aam-haa-#.#.#-# and VMware-aam-vcint-#.#.#-# .
  5. Remove these packages using rpm -e followed by the name of one of the returned packages.
  6. Repeat the process for the other file.
  7. Type rpm -qa | grep -i vpxa . A package named VMware-vpxa-#.#.#-##### is returned.
  8. Remove this package using rpm -e followed by the name of one of the returned package.
  9. Test adding the host to the newly created cluster to see if this has resolved the issue.

Step 3 – Correcting issues with ESX Server host name configuration and name resolution

To function correctly VMware HA requires DNS and the ESX Server to report the same host name for a host to be configured. The following illustrates how to change the host name and to check the /etc/hosts file, on the ESX Server to ensure all three report the same information.

Note: As of VirtualCenter 2.0.2 and above host names are case sensitive. Host names must be lowercase in the /etc/hosts file on your ESX Server hosts.

Use the command hostname and hostname -s to show the Fully Qualified Domain Name (FQDN) and short host name on the server. Compare this output with the values set in the /etc/sysconfig/network and /etc/hosts files . Edit the files accordingly to change any characters from uppercase to lowercase. To dynamically change the host name without having to reboot the ESX Server host, use the command hostname <lowercasename>, where <lowercasename> is your server's host name in lowercase characters.
 
If modifications were made, try to add the ESX Server back into the cluster or do a reconfigure for VMware HA and see if the configuration issue has been resolved.  

Step 4 - Verifying host name resolution between ESX Servers

Using the ping command, from the service console command shell of every ESX Server host check connectivity to every other ESX Server host in the cluster. Ensure to ping using the short host name, FQDN, and IP address to test connectivity. Also, check the connectivity of every ESX Server host to the isolation address of the cluster. This is the default gateway of the service console by default.

Note: VMware HA uses DNS extensively, so DNS must be perfect (for both forward and reverse lookups). You must check this on every ESX Server host machine in the cluster.

If you cannot ping, the connectivity issue needs to be resolved before VMware HA can function properly. The most common problem is that it works via IP but not by name, therefore name resolution is not functioning properly.

The first step is to verify the correct DNS servers are referenced in /etc/resolv.conf file. If the server(s) are incorrect, correct them and then try pinging the hosts again.

A sample resolv.conf file:

----begin file----------
nameserver 10.0.0.29
nameserver 10.0.0.30
search domainname.com
----end of file----------


DNS issues can also be resolved by editing the /etc/hosts file on all the ESX Server hosts in the cluster, even the ones that are not reporting an issue. Errors of this type can appear on a box that did not report any issue and appeared to be working fine, but was not.

Sample hosts file:

----begin file----------
127.0.0.1 localhost.localdomain localhost
10.0.0.1 esxhost1.domainname.com esxhost1
10.0.0.2 esxhost2.domainname.com esxhost2
10.0.0.3 esxhost3.domainname.com esxhost3
----end of file----------

When the file has been changed retry pinging between servers. If you are able to ping between the ESX Servers, try adding the server back to the cluster again.
 
 
Note: If your problem still exists after trying the steps in this article:

Product Versions
VMware ESX 3.0.x
VMware ESX 3.5.x
VMware VirtualCenter 2.0.x
VMware VirtualCenter 2.5.x

0 comments:

Post a Comment

 
 
 

Popular Posts