VMware Cloud Community
Froste
Contributor
Contributor

Issue with loosing pings when taking a snapshot or using VCB

I am have a serious issue with my VI3 environment when I attempt to use snapshots or VCB. The problem is that I loose from 7 -21 pings when a snapshot is kicked off or is removed, this causes my frontend applications to loose connection with my backend database.

The issue seems more noticable based on the size of the VM that is being snapshoted, an example is a Print Server with 1 CPU looses 7 pings.

Here is my current environment

24x IBM HS21 Blades with 16GB of RAM, 2x Quad Core Intel Xeon 2.33 (8 Virtual CPUs), and this is connect to an IBM DS8100 SAN

0 Kudos
11 Replies
fhpaschen
Enthusiast
Enthusiast

How is your VMotion VLAN configured? Is it dedicated? Shared with other services on the same virtual switch? Is it gigabit or 100Mbps?

0 Kudos
Froste
Contributor
Contributor

Here is my Vlan layout, it is GB and shared with the Vmotion service.

0 Kudos
petedr
Virtuoso
Virtuoso

When you are taking the snapshot is the 'Snapshot the virtual machine's memory' box checked. If so try not checking it.

www.thevirtualheadline.com www.liquidwarelabs.com
0 Kudos
fhpaschen
Enthusiast
Enthusiast

Depending on how busy your LAN is, my guess would be that things are a little congested sharing vmotion with your LAN traffic. If you were able to dedicate a NIC to vmotion on each host (even temporarily) to test, you could prove out that theory. I believe it is recommended that vmotion be on a separate LAN/VLAN for both performance and security.

Froste
Contributor
Contributor

I have attempted the snapshot both with and without that checkbox selected.

0 Kudos
petedr
Virtuoso
Virtuoso

ok, thought there was a slight VM freeze when snapshoting the memory but if you are getting the same results with it not checked then it is obviously something else going on.

www.thevirtualheadline.com www.liquidwarelabs.com
0 Kudos
Froste
Contributor
Contributor

I have created a new VSwitch, used my inactive NIC, and assigned the VMotion kernel. The results were the same, if you have a different setup idea please post.

0 Kudos
TheRealJason
Enthusiast
Enthusiast

Are you snapshotting the backend database? I suspect that you are running into a problem with the VMWare tools trying to quiesce the filesystem when you take the snapshot. This is especially prevalent in DB and Active Directory servers. Since the filesystem is getting written to so often, it has trouble quiescing the filesystem to get a clean snapshot, and it causes problems. One option is to disable the quiescing, but then you don't really have a crash consistent state to restore from. One option is to stop the DB while the snapshot is being taken, but realistically that isn't an option either.

Maybe a combination of dumping the database out to a flat file, then snapshotting with quiescing disabled, so that you can at least restore the db from the dump if you have to go back.

0 Kudos
Froste
Contributor
Contributor

No, the backend Database resides on a physical server. The backend is on a beefy 4 CPU server with 32Gb of RAM and redundant NIC's.

This issue occurs on all of my VM servers regardless of function, from a print server to a application server.

I would be happy to attempt to disable the quiescing, could you please forward me a link to a knowledge base for instructions.

0 Kudos
TheRealJason
Enthusiast
Enthusiast

0 Kudos
TheRealJason
Enthusiast
Enthusiast

If it is happening to all of your servers, regardless of what role it serves, then I would be hesitant to think it was the file sync driver causing the problem. I think the other posters were probably on track with it being some type of misconfiguration, or unoptimized configuration somewhere. Snapshotting should not always cause those outages.