Solved: Re: Isolation address specified but default GW rep...

Argyle · ‎08-07-2007

After we upgraded Virtual Center to 2.0.2 the cluster (6 servers, ESX 3.0.1) report an issue in Virtual Center like this:

\----

Issue detected on esxhost11.mydomain.com in Cluster 01: Could not reach isolation address 10.x.x.254

Issue detected on esxhost12.mydomain.com in Cluster 01: Could not reach isolation address 10.x.x.254

Issue detected on esxhost13.mydomain.com in Cluster 01: Could not reach isolation address 10.x.x.254

....

\----

10.x.x.254 is not pingable and never was. Thats why we specified the advanced option das.isolationaddress to another IP (virtual IP on 24/7 core routers) 10.z.z.1 But now if we ever reconfigure HA this message is up permanently in VC.

Checking aam_config_util_addnode.log in /opt/LGTOaam512/log/ we see the following:

\----

Node Type State

\--

\
\--

esxhost11 Primary Agent Running

esxhost12 Primary Agent Running

esxhost13 Primary Agent Running

....

Waiting for agent to come alive, status is : running

wait_agent_startup: elasped time 0 minute(s) and 21 second(s)

CMD: /bin/ping -c1 10.z.z.1

RESULT:

\----

PING 10.z.z.1 (10.z.z.1) 56(84) bytes of data.

64 bytes from 10.z.z.1: icmp_seq=0 ttl=254 time=1.21 ms

\--- 10.z.z.1 ping statistics ---

1 packets transmitted, 1 received, 0% packet loss, time 0ms

rtt min/avg/max/mdev = 1.214/1.214/1.214/0.000 ms, pipe 2

CMD: /bin/ping -c1 10.x.x.254

RESULT:

\----

PING 10.x.x.254 (10.x.x.254) 56(84) bytes of data.

\--- 10.x.x.254 ping statistics ---

1 packets transmitted, 0 received, 100% packet loss, time 0ms

\----

Both the the isolation address and the default gateway is pinged. I assume this has always been the case (can't see why a VC upgrade would impact AAM), just that VC 2.0.1 didnt show the gateway as an issue and reported it in the GUI.

Now with 2.0.2 the cluster is always marked red in VC and shows this entry for all six servers. Quite annoying but not fatal 😛

Does anyone know how to get rid of these "errors" showing up.

TuukkaK · ‎08-10-2007

After you upgrade to VC 2.0.2 the HA configuration is changed. If you have specified an alternate isolation address the DG is added to the list. I tried to add 2 altenate isolation IPs but it only accepts one and the other one is forced to DG. So I can't get rid of the warning in HA. i don't know if this is a bug or by design. In VC 2.0.1 you could just change IP and the DG was removed.

View solution in original post

masaki · ‎08-07-2007

Did you try disabling HA, removing all the HOSts from the cluster or creating a new one and re-enabling HA?

AAM it's so delicate.

There could be some old file somewhere.

Texiwill · ‎08-07-2007

Hello,

Do not forget to look at the HA logs in /opt/LGTOaam512/log for assistance.

Best regards,

Edward

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill

Argyle · ‎08-09-2007

I've tried removing and adding HA again and it always complain on the default gateway even if the isolation address is specified. I will upgrade to 3.0.2 and see if it helps.

TuukkaK · ‎08-09-2007

I get the same "error" with a fresh install of esx3.0.2 and VC 2.0.2. After adding another isolation IP the cluster has only a yellow warning but it's quite annoying....

Argyle · ‎08-10-2007

Indeed, upgrading to 3.0.2 didn't help so I assume its a new thing in VC 2.0.2. It no longer filters out the gateway error even if you have an isolation address specified.

TuukkaK · ‎08-10-2007

I'm just testing this. VC 2.0.1 and esx 3.0.1 or esx 3.0.2 works like it supposed to. The isolation address is changed in /opt/LGTOaam512/bin/aam_config_util.def. Now I'll test it with vc 2.0.2 to see if it just adds another isolation IP. I'll get back how it turns out...

masaki · ‎08-10-2007

We got the same issue.

IMHO this is not an error (HA should work anyway if you set isolation address) but a new information from VC 2.0.2 due to DG not pingable.

The solution is "enable icmp on DG".

TuukkaK · ‎08-10-2007

After you upgrade to VC 2.0.2 the HA configuration is changed. If you have specified an alternate isolation address the DG is added to the list. I tried to add 2 altenate isolation IPs but it only accepts one and the other one is forced to DG. So I can't get rid of the warning in HA. i don't know if this is a bug or by design. In VC 2.0.1 you could just change IP and the DG was removed.

Mork · ‎08-21-2007

Interesting, just came across this thread as I'm experiencing the error on one of my clusters after my VC 2.0.2 upgrade.

The host that is reporting the error is actually missing the sourceType = isolation section from /opt/LGTOaam512/bin/aam_config_util.def.

I've just copied the section from one of the other hosts in the cluster and reconfigured for HA and it just removed it again...

I also (previously) disabled HA and reenabled it on the cluster to try to resolve the issue, but all it did was move the issue from one host to another.

I did this twice with the same result, so no fix yet but I'm still trying...

Mork · ‎08-21-2007

Ok, I just set the DG address manually via the advanced options and it fixed it.

I removed the option and reconfigured HA and it's now not working again...

Time for Maintenance Mode and a reboot I think.

Thankfully this is my Test cluster, all 3 Prod ones are working ok.

Message was edited by:

Mork

masaki · ‎08-22-2007

Check your service console and vmkernel DG!!

Is the same DG?

masaki · ‎08-22-2007

Try setting an host as das.isolationaddress.

It works?

I think your DG is unpingable.

May be the network guys disable ICMP on it

Mork · ‎08-22-2007

DG is definitely good, these hosts have been working since the initial install of 3.0 after I upgraded from 2.5.3.

I did double check that at the time, and they were still set correctly and route shows the correct DG on the SC.

I also previously had to convince the firewall guys to enable ICMP to the DG. Initially it was pingable, but then I had some issues and discovered at some point they had disabled ICMP (every internal network here is firewalled from every other internal network - read nightmare!). I was able to successfully ping the DG while this issue was going on.

After a reboot and reconfiguring HA on the host, it all came back good and is now working fine again.

masaki · ‎08-23-2007

Well, I experimented the same network battle with network guys.

You should only negotiate a ping threshold (frequency).

Please assign points for helpful and correct answers.

see you on VMTN

admin · ‎09-26-2007

Under advanced settings, if you set das.usedefaultisolationaddress to "false", then only the das.isolationaddress value will be used. I think this is what you want.

admin · ‎09-26-2007

In addition to das.isolationaddress and das.usedefaultisolationaddress = false, in VC 2.0.2 you now have the option of adding multiple isolation addresses. das.isolationaddress1 das.isolationaddress2 ...etc.

I am working on getting a KB published on this.

-Kris

klich · ‎09-26-2007

Kris,

We're all looking forward to seeing the KB.

How many can das.isolationaddressX entries can be created? and can the be uniquely defined per ESX host within a cluster, or only within the cluster HA configuration (still haven't seen good documentation on this parameter)?

Thanks,

Kevin

admin · ‎09-26-2007

Kevin,

VC 2.0.2 supports up to 10 isolation addresses, das.isolationaddress1 to das.isolationaddress10 (though most of the time 1 or 2 is enough). This is set at the cluster level and not the ESX host level.

Regards,

-Kris

admin · ‎09-27-2007

One thing worth noting, the greater the number of isolation addresses, the longer it will take for a node to declare itself as isolated, and (depending on the cluster settings) the longer it will take to release the .vmdk files associated with the VMs powered on that host. That will have an impact on how soon the other node(s) can take over the VMs since they cannot be powered on the remaining cluster hosts until the .vmdk lock has been released.

While you shouldn't need any more than one or two isolation addresses, more is not necessarily better. Keep them as "close" to the ESX host and with as few components between them as possible. The goal is to reduce the possible points of failure that could lead to a false positive test of network isolation.

Marc

All

Isolation address specified but default GW reported as issue in VC 2.0.2