VMware Cloud Community
Argyle
Enthusiast
Enthusiast
Jump to solution

Isolation address specified but default GW reported as issue in VC 2.0.2

After we upgraded Virtual Center to 2.0.2 the cluster (6 servers, ESX 3.0.1) report an issue in Virtual Center like this:

\----


Issue detected on esxhost11.mydomain.com in Cluster 01: Could not reach isolation address 10.x.x.254

Issue detected on esxhost12.mydomain.com in Cluster 01: Could not reach isolation address 10.x.x.254

Issue detected on esxhost13.mydomain.com in Cluster 01: Could not reach isolation address 10.x.x.254

....

\----


10.x.x.254 is not pingable and never was. Thats why we specified the advanced option das.isolationaddress to another IP (virtual IP on 24/7 core routers) 10.z.z.1 But now if we ever reconfigure HA this message is up permanently in VC.

Checking aam_config_util_addnode.log in /opt/LGTOaam512/log/ we see the following:

\----


Node Type State

\--


\
\
--


esxhost11 Primary Agent Running

esxhost12 Primary Agent Running

esxhost13 Primary Agent Running

....

Waiting for agent to come alive, status is : running

wait_agent_startup: elasped time 0 minute(s) and 21 second(s)

CMD: /bin/ping -c1 10.z.z.1

RESULT:

\----


PING 10.z.z.1 (10.z.z.1) 56(84) bytes of data.

64 bytes from 10.z.z.1: icmp_seq=0 ttl=254 time=1.21 ms

\--- 10.z.z.1 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 1.214/1.214/1.214/0.000 ms, pipe 2

CMD: /bin/ping -c1 10.x.x.254

RESULT:

\----


PING 10.x.x.254 (10.x.x.254) 56(84) bytes of data.

\--- 10.x.x.254 ping statistics ---

1 packets transmitted, 0 received, 100% packet loss, time 0ms

\----


Both the the isolation address and the default gateway is pinged. I assume this has always been the case (can't see why a VC upgrade would impact AAM), just that VC 2.0.1 didnt show the gateway as an issue and reported it in the GUI.

Now with 2.0.2 the cluster is always marked red in VC and shows this entry for all six servers. Quite annoying but not fatal 😛

Does anyone know how to get rid of these "errors" showing up.

0 Kudos
23 Replies
klich
Enthusiast
Enthusiast
Jump to solution

thanks Marc,

I noticed the comments about increasing the time (das.failuredetectiontime) in the HA_Tech_Best_Practices document as well.

"The response time can be configured to be different than 15 seconds (1500 ms). 60 seconds (60000 ms) is an alternative commonly used."

"The default timeout value should also be increased to 20 seconds (20000 ms) or greater when a secondary isolation address has been specified"

Kevin

0 Kudos
ReggieSmith
Enthusiast
Enthusiast
Jump to solution

I just experienced this same issue but it was only on 1 ESX host. I had not set a manual isolation address so I set both the das.isolationaddress and das.usedefaultisolationaddress options as mentioned in this post, disabled HA on the cluster then re-enabled it.

Error is gone.. for now....

:smileycool:

0 Kudos
c0d3rZ
Contributor
Contributor
Jump to solution

I have configured das.isolationaddress and das.isolationaddress2 in the advanced options of VMWare HA

I can ping both isolationaddresses from both ESX servers, but I still get the error: could not reach isolation address "das.isolationaddress", "das.isolationaddress2"

VC 2.0.2

ESX01 3.0.1

ESX02 3.0.2

0 Kudos
admin
Immortal
Immortal
Jump to solution

Did you set das.usedefaultisolationaddress to false? Or is the default GW for both both service console interfaces also pingable?

0 Kudos