The express patches have been posted. This thread is long.
Please post technical experiences here and non-technical feedback here. --JohnTroyer
Hi all,
We've just encountered a serious bug with our ESX cluster - serious enough that I thought I should post about it here as a prior warning for others running ESX 3.5 Update 2.
The VMWare tech support person we spoke to wouldn't 100% confirm whether this was / would be affecting all ESX3.5u2 installs, but he strongly alluded that it was widespread. For others sake I hope I'm wrong and it's limited.
The bug:
Starting this morning, we could not power on nor VMotion any of our Virtual Machines. The VI Client threw the error "A general system error occurred: Internal Error".
Further digging lead us to messages like this one in /var/log/vmware/hostd.log, and the log file for any virtual machine we tried to power on or VMotion:
Aug 12 10:40:10.792: vmx| This product has expired.
Aug 12 10:40:10.792: vmx| Be sure that your host machine's date and time are set correctly.
Aug 12 10:40:10.792: vmx| There is a more recent version available at the VMware Web site: "http://www.vmware.com/info?id=4".
A call to tech support confirmed this as a known problem with a temporary workaround.
The work-around:
Turn off NTP (if you're using it), and then manually set the date of all ESX 3.5u2 hosts back to 10th of August. This can be done either through the VI Client (Host -> Configuration -> Time Configuration) or by typing date -s "08/10/2008" at the Service Console command line on the ESX hosts.
As soon as the date was reset to the 10th - problem solved.
Note that running VMs were operating fine, this only seems to affect initial VM power-on (including from suspended state) and VMotion.
So, it sounds like a serious licensing bug has crept into 3.5u2. Further testing shows that the problem begins as soon as the date hits 12th August - 10th is fine, 11th is fine, 12th and the problem appears.
There wasn't any real reference to similar problems in the forums as far as I could see, but it's quite possible we're seeing this before most of the rest of the world as we're in Australia, and therefore the date here ticked over to the 12th "before" those in Europe, America, etc.
Hope this helps others... took us a couple of hours to get this far - at least we can power on VMs again though!
Cheers,
Matt Kilham
Message was edited by: JohnTroyer to add new thread links.
Nice Daniel,
Although I would go back a year and not restart ntp.
gd
Nice Daniel,
Although I would go back a year and not restart ntp.
gd
Many Thanks.
I am only running that build on 1 dev/uat esx host which is not so bad.
Can this update be uninstalled or do I just need to wait for the patch?
I don't think this is an uninstallable module. We will have to wait for the fix.
As for VC upgrade I would remain as is and not do any upgrades untill root cause has been resolved by VMware.
Don't know what else might be hiding.
to the guy with the CNC machines:
set the date back on your esx host, boot up your cnc software servers, disable vmware tools from syncing time with the host and set up ntp on your virtual servers. There is no need to have the virtual machines syncing their time with the hw clock. If you have an AD your servers wil sync against nearest Global Catalog which then sync with the PDC emulator. Make sure your FSMO PDC emulator server has correct NTP settings. ntp.org is a good place to start.
net time /setsntp:ntpserver.domain
/Thomas
Although I would go back a year and not restart ntp.
gd
I choose 1 year to easily detect the switch in the logs. And why not restart ntpd after the poweron?
I'll hold my comments for later. I'm replying to receive email updates on this thread.
Jas
[i]Jason Boche[/i]
[VMware Communities User Moderator|http://communities.vmware.com/docs/DOC-2444][/i]
shane.presley: I think that this is only ESX issue. I ask VMware support and they answer me this:
"Hopefully, you are not going to have any problem if you update the VC server".
I'll wait - this is safe and we don't need upgrade to U2 now.
If you have multiple virtual domain controllers, syncing them against NTP is only a suboptimal solution on VMware as the clock will not be entirely reliable unless you use the descheduled time service and your DCs might eventually get out of synch.
http://download3.vmware.com/vmworld/2006/tac9710.pdf
Lars
dalo, disregard my previous comment. Have deen getting apache timeouts on my connection.
Just some information:
I encountered this problemin ESX 3i as well.
The workaround to set the host date back to August 10 worked though, I could even set the date to normal again after powering up some VM's.
I'm replying to receive email updates on this thread
Thanks!
I am also facing the same problem here in Dubai. But now I can start vm by changing the date. As per your suggestion. I spend my 5 hours to fix this problem.
Thanks,
Jamal
Hi Lars,
i agree, but my sulotion is just temporary and would get the cnc machines up and running untill tomorrow when new release of U2 is launched?
David,
I'm running VC 2.5U2 without problems. It is managing our 18 prod servers (2 of them 3.5U2, the rest 3.5U1) and 200+ VM's.
I haven't faced any issues with Virtual Center.
I could even VMotion back from 3.5U2 to 3.5U1 to remove the two 3.5U2 hosts from the cluster.
Best regards, Ludovic
Replying to receive updates.
All you have to do, is click on "Receive email notifications" to get the mail, you don't have to reply...
Jase McCarty
Co-Author of VMware ESX Essentials in the Virtual Data Center
(ISBN:1420070274) from Auerbach
Folks, just an FYI tip: It's not necessary to reply to the thread to receive updates. In the "Actions" box next to the thread, there is a link that allows you to "Receive e-mail updates."
Is the fix for this issue going to be available through the VMware Infrastructure Update tool that comes with ESXi?
Thanks.
As a few have mentioned it does seem to me that Update 2 may very well have been installed without you wanting it. At least this happened to me. On august 4 I installed ESX 3.5 Update 1 on a server and added it to our development cluster. Since we want to have all ESX hosts at the same level we have implemented fixed baselines in Update Manager. The last updates in the baseline was released june 12. When I chose to remediate the server I noticed that ESX350-200806201-UG and ESX350-200806202-UG released july 25 got installed (essentially the VMware kernel from update 2). After testing this a few times I found that both ESX350-200804404-BG and ESX350-200804405-BG would install the two later updates. I have found no solution to this. This means that if I install any of the two updates from april that my host will upgrade to ESX 3.5 Update 2. I have looked through the descriptor.xml files from Update Manager and have not found anything odd.
I decided to keep the upgrade, updated to Virtual Center 2.5 Update 2 (VC 2.5 Update 1 does not start if you have ESX 3.5 Update 2 hosts). And now this license issue. Fortunately this only impacts a few development systems. Had this happened to my production servers it would have been much worse.
And a note to VMware: Updated ISOs is not the first priority to me. I want updates in Update Manager.