Hello,
We have a MS-cluster setup in vmware on 1 physical box. It consits of 2 VM's, 1 database and 1 application server. The OS is W2K server and SQL version is SQL 2005 Standard Edition. We've had this cluster in production for over a year, last week however we updated the ESX-host with the machine on. We booted the VM's and the cluster came up fine, no problems or error messages at all.We took an extra backup of the databases and there was no problems with it so everything looked fine.
Next day we noticed however that when we get high I/O on the server the SQL-server log error 170 (which is "Requested resource is in use"), SQL Error 9001. The server also log in the eventvwr "{Lost Delayed-Write Data}" error. When I google on the error's moste of the answers say that it's possible some kind of h/w error on the disk.
The disks in the MS-cluster is setup with .vmdk files on our iSCSI-vmfs. The scsi-controller used is buslogic (since its W2K-servers).
Has anyone ran in to something similar? Or anyone has any idea what we can do about this?
What did you update on the ESX hosts?
oh, I forgot to mention that lol...
We updated our hosts from ESX 3.0.2 to ESX 3.5 Update1
Are you running active/active or active/passive? Do you see any SCSI errors in the /var/log/vmkernel logfile?
-KjB
The SQL-instace we run in active/passive,
The disk's are also active/passive, however they share the same disk controller.
In the /var/log/vmkernel I see no SCSI warnings at all.
I would run a couple of tests using iometer and/or sqlio tools. You may be running into some latency issues. What kind of SAN are you using, and have you checked your SAN for issues as well?
-KjB
Jonas_B.. did you ever find a solution for this?
Hello,
No we did not find a solution, we ended up installing a new SQL server with Win2K3 and no MSCS and moved all the databases there.
I had a similar issue on a physical server that was clustering. I found that for some reason the storage group had moved to the opposite node and was no longer accessible to the primary Cluster node. Once I ensured they were both on the correct node the problems went away. Perhaps this issue is somehow similar?