VMware Cloud Community
admin
Immortal
Immortal

ESX HA vs. MS Cluster

I'm migrating three file servers to VM's. We support about 450 users and 13TB of data across these three servers. I am torn between setting up MS cluster or relying on ESX HA for only hardware redundancy.

This is my thought process. 99.99 uptime allows roughly 40 minutes of downtime per month. If a server is tightly controlled from a change perspective (software installation, patch updates, config changes, etc) it should remain stable. We seperare our servers based on roles and do not add anything extra to them. For example, 2 of our 3 file servers are running over 170days uptime. These are Win2k8. We dont needlessly apply windows updates unless they are specifically relevant. So i think 99.99 uptime is realistically achievable. The downside without ms clustering though obviously is the interruption to service for restarts due to configuration changes or patch updates during production hours, in our case 0600 - 2330. Currently in our remote file server cluster (seperate site) we can failover the nodes with less then 10-15 seconds interruption to users mid day, they just need to reconnect to the share they were in sometimes.

Advantages of just HA though, less reliance on node pairs being identical for configs, patches, revision levels, etc. When something goes wrong with a cluster there is always that additional complexity involved in resolving and troubleshooting the problem. With ESX and HA its very straight foward. Likewise, snapshots are available before patch or config changes and we use veeam to backup our vm's - all available in a clustered solution as well but without the complexity.

We attach to our volumes using the ms initator.

What are others thoughts on this? What have you done? How did you rule out using ms clustering aside from a licensing perspective?

0 Kudos
19 Replies
NuggetGTR
VMware Employee
VMware Employee

I would treat them the same as physicals as you are now, 3 machines running mscs and HA, there are little quirks running mcsc in vm's but its generally fine. just running HA by default it would take 15sec(which i increase) before HA kicks in then xx seconds for the machine to come up with what your saying your mscs setup now offers smaller outage.

The only way I would forgo mscs alltogether is with vmware FT, just upgrading my environment to vsphere4 atm so havent played with FT too much but theoretically could give near %100 uptime

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
0 Kudos
AndreTheGiant
Immortal
Immortal

IMHO in most case I prefer the HA solution... if you can wait some minutes then is very simple to implement and manage.

But for critical DMBS (or similar product) it cannot protect the DB integrity (cause you can have a crash of your VM)... so in this cause could be better a guest cluster solution (that also is more faster to restart the single service).

Now there is also VMware FT that could be very nice in this case, but actually has a big limitation with a single vCPU.

See also the reference in:

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
mreferre
Champion
Champion

My original thoughts:

http://it20.info/blogs/main/archive/2008/03/26/102.aspx

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
savantsingh
Enthusiast
Enthusiast

You can use a combination of both HA and MSCS. Just configure your VMs in a way that they run of different hosts.

You would find different arguments for implementing HA or MSCS, but i would make a decision on the RTO.

You could also look at having a mix of physical and virtual cluster.

These docs would give you more insight:

http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_mscs.pdf

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100461...






If you found this information useful, please consider awarding points for "Correct" or "Helpful".

If you found this information useful, please consider awarding points for "Correct" or "Helpful".
0 Kudos
amvmware
Expert
Expert

My preference would be to use the features of vsphere such as vmotion and HA for the following reasons.

1. I can make all my servers highly resilient - Not just the business critical systems.

2. I have to purchase enterprise edition to get clustering - I have already purchased resilience and redundancy as part of my vSphere licenses.

MS clustering is now far easier to setup than previous versions and less of a mind field when it comes to HCL lists, but at the end of the day you will be deploying an active passive cluster that will protect you from hardware failure of a cluster node and restart the virtual server on another host - vSphere offers this and more.

0 Kudos
SurfControl
Enthusiast
Enthusiast

Good topic, I have the same question and I’m debating on those options as well, indeed ESX HA is only good to protect you from a hardware failure, but MSCS provides HA at both level of hardware failure and software failure/corruption.

I guess the bottom line is: how much downtime can you afford? how about the data lose? with ESX HA only option, If an VM’s OS becomes corrupted, how much time will take you to recover from that point?

0 Kudos
NuggetGTR
VMware Employee
VMware Employee

Exactly thats why I treat the vm's the same as a physical box still run MSCS but also have the vm's covered by HA. I know this will depend on licencing but if your already got MSCS running then why not keep it running within a virtual environment plus as you mentioned MSCS will cover your arse in the event that the OS craps itself. Currently i have file/print exchange sql dns all using MSCS and running in on esx and covered by HA and DRS as well. everything is working fine so far which is nice.

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
0 Kudos
amvmware
Expert
Expert

MSCS clustering won't protect you from a software failure - a failure of the OS on a node in the cluster, yes, but not of the virtual server or the application it is running.

Surely isn't that what HA does as well - a failure of the esx node will cause the VM's to restart on another node.

0 Kudos
SurfControl
Enthusiast
Enthusiast

Sure it will, if that application is a Cluster-Aware application

0 Kudos
amvmware
Expert
Expert

Can you name me a cluster aware application were MSCS clustering will protect you from a corruption of the data in the application - an exchange database or a sql database. - If the data is corrupt then it is corrupt - MSCS won't help with this scenario.

0 Kudos
SurfControl
Enthusiast
Enthusiast

sure, MSCS has its limitations, but I don’t think that’s the question here...

0 Kudos
amvmware
Expert
Expert

That is exactly the question - if i can get the same levels of service availability from HA as i can get from MS clustering - and that is the ability to reboot my virtual server on another host - why deploy clustering and add an additional layer of complexity?

0 Kudos
mreferre
Champion
Champion

>Can you name me a cluster aware application were MSCS clustering will protect you from a corruption of the data in the application - an exchange

>database or a sql database. - If the data is corrupt then it is corrupt - MSCS won't help with this scenario.

I guess people were mentioning failure of the application code running on the nodes. Not the data the application uses.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
SurfControl
Enthusiast
Enthusiast

reboot your guest off another host? sure, what if that guest system becomes unbootable after the patching...?

IMO, again, this goes back to that SLA question, if you can afford the downtime/service interruption that causedby for example OS corruption, then ESX HA is good enough for you.

For people who cannot afford the downtime or trying to limit the downtime to minimum, MSCS does offer an additional layer of protection.

0 Kudos
mreferre
Champion
Champion

reboot your guest off another host? sure, what if that guest system becomes unbootable after the patching...?

Reverting the snapshot you have taken before applying the patch.

Obviously MSCS on top of a virtualized infrastructure has its own advantages but it comes at a cost: more complexity, more certification/supportability checklists etc etc.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
SurfControl
Enthusiast
Enthusiast

reboot your guest off another host? sure, what if that guest system becomes unbootable after the patching...?

Reverting the snapshot you have taken before applying the patch.

That’s the next point I was trying to make, the downtime/service interruption that introduced by this kind of reboots due to patching, service pack install, vm tools upgrade, virtual hardware version upgrade etc.,,, basically it’s just not acceptable by our clients…

has anyone tried to do a Windows 2008 sp2 upgrade? you know how long that would take…

0 Kudos
mreferre
Champion
Champion

I am with you.

I was just offering another point of view. If you read the link I posted above I have called out those differences in my blog post.

The world is not black or white so there isn't a size that fits all.

Massimo.

Massimo Re Ferre' VMware vCloud Architect twitter.com/mreferre www.it20.info
0 Kudos
msemon1
Expert
Expert

I am looking at setting up a SQL cluster, however, the documentation says you lose the ability to do DRS/HA and snapshots. Does this change in v-Sphere or are you doing this with ESX 3.5? The big issue issue for us is that with MSCS clusters if the OS or application crashes there is fault tolerance. With HA/DRS it only restarts the VM if the the ESX hosts crashes to my understanding.

Mike

0 Kudos
SurfControl
Enthusiast
Enthusiast

vSphere MSCS Setup Limitations:

Before you set up MSCS, review the list of functionality that is not supported for this release, and any requirements and recommendations that apply to your configuration. The following environments and functionality are not supported for MSCS setups with this release of vSphere:

1, Clustering on iSCSI, FCoE, and NFS disks.

2, Mixed environments, such as configurations where one cluster node is running a different version of ESX/ESXi than another cluster node.

3, Use of MSCS in conjunction with VMware Fault Tolerance.

4, Migration with VMotion of clustered virtual machines.

5, N-Port ID Virtualization (NPIV)

6, With native multipathing (NMP), clustering is not supported when the path policy is set to round robin.

7, You must use hardware version 7 with ESX/ESXi 4.0.

0 Kudos