From the paper on configuring MSCS across physical hosts on ESX3.5, it was stated that RDM be used for the quorum and all other shared disk volumes.
I have been wondering why can't the quorum and shared storage be on vmfs3 partitions instead. I have a large number of MSCS clusters to be set up and using vmfs3 for shared storage will definately make the SAN administration a lot easier, not to mention the large number of LUNs that will be required to be configured on my storage controller and risk exceeding the maximum number of LUNs it can support.
Hence, I have set up a 2 node MSCS cluster across 2 different hosts, using a shared vmfs3 partition on a shared iscsi target. The shared vmdk files (of type eagerzeroedthick) for the quorum and shared storage were created at the service console using vmkfstool. The scsi controller for the quorum disk and shared disks is of type "LSI Logic" and "Physcial" mode.
The rest of the configurations were followed using the vmware MSCS configuration guide and the Microsoft MSCS configuration guides. The 2 guest OS are windows server 2003 enterprise editions.
The MSCS installation and configurations were successfully completed and my initial testings looks good, with the nodes being able to failover well in both planned and unplanned scenerios.
So my big question is... what's wrong with using vmfs3 for the quorum (and shared storage) instead of RDMs??
I forgot to add that I am using ESX3.5 update 3 and vCenter 2.5.
Try to take a VCB backup or a stand alone snapshot, and see what happens :smileymischief:
Ohh thanks! I am starting to see some differences between the 2 implementations!
I don't use VCB for my backups (luckily?). Correct me if i am wrong but i have the impression we cannot take a snapshot of RDM disks either?
Are there any other areas I should look out for?
I think you can snap raw disks if you set compatibility mode to virtual....but I must admit I have never tried it....
This could be a little off topic... just my 2 cents on your configuration...
MSCS is not supported by ESX on iSCSI, only on FC...
cheers
\aleph0
____________________________
###############
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Please pardon me, but what is meant by "snap raw disks"?
I am really curious why it isn't supported on iSCSI by VMware. Having MSCS restricted to FC RDM is just so frustrating...
I wonder if I am missing something critical with MSCS on iSCSI....
Could you pls tell me what services are you running on virtualized MSCS?
Thank you
\aleph0
____________________________
###############
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
vCenter, Exchange, IIS, MSSQL, Oracle, Websphere.
why are you clustering vCenter? I think that's not supported by Vmware
do the clustered SQL manteins the VC DB?
do you trust your configuration? is it in production?
\aleph0
____________________________
###############
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
You mean VMware does not support clustering vCenter although VMware has released paper on clustering vCenter (http://www.vmware.com/resources/techresources/945) and it being recognized as a "proved practice" at VIOPS (http://viops.vmware.com/home/docs/DOC-1104)?? I seriously need to go back to reading the support VMware provides.
The clustered SQL maintains the VC DB as well as for other applications.
The configuration looks fine so far. I am doing some testings and studies before concluding on its suitability for production purposes and thus greatly appreciate all feedbacks! 😃
Sorry, I should clarify...RDM = Raw Device Mapping, which I refer to as a RAW disk, and by snap I mean "take a snapshot"
In my opinion:
1 virtual machine with vCenter (but not in cluster)
1 physical SQL (maybe in cluster) with VC and really Disk I/O intensive DB that's servicing your enterprise.
1 virtual SQL with non Disk I/O intensive DB.
The rest it's up to you: I will not go for clustering inside ESX: moreover next generation of ESX will provide Fault Tolerance feature for VMs that will give you business continuity instead of high availability (HA Features)
HTH
\aleph0
Ohh... I didn't know that... It will be useful for my other non-clustered VMs with RDMs... I shall try that tomorrow! 😃
I wish the Fault Tolerance feature get released soon! That will certainly solve a lot of my problems! But I heard that the FT feature currently supports only VMs with only 1 vCPU... that will probably pose an issue for some of my VMs...
On a side note: I like your blog! And I have been wanting to pick up PowerShell since ages ago!
While you can create cluster disks on VMFS storage, it is not supported. Since both cluster nodes share the disk that they mount between them, a triggered I/O to that disk could cause a cluster node to think a node is down, and the same can occur if you have extended issues with vmotion or a SAN connectivity problem. This can cause a split-brain scenario when the passive node tries to take over the disk, but then the I/O issues lapse, and the active node comes back online. Depending on the timing, you can end up with data corruption on the shared disk. This is why clustering across nodes requires a physical mode RDM, which could preclude them from a vmotion due to the physical bus sharing.
-KjB
VMware vExpert 2009
Why are you using multiple CPU? On which service?
Do the server have applications that are SMP aware?
If not, always use 1 vCPU.
Moreover remember that the kernel to be used on Windows is selected to be 1 cpu or multiple cpus at installation time: if you install with multiple cpu kernel and revert the machine to 1 vcpu later, the VM will have overhead in execution.
Cheers
\mf
Please pardon me as I am not very familiar with MSCS and storage configurations, but what is meant by "a triggered I/O" and how does RDM prevents this from occurring?
All my oracle instances have at least 2 vCPUs allocated if not 4.
We have started off from 1vCPU for all the instances and have steadily increased the configurations after analyzing the performance data collected over the past 1+ year.